
a «. b * (7 $.770 


^VERSITY 


*■'■**■ 17 £770 

This book should be returned on or before the date last s&unped 
below. An overdue charge of 0 6 nP. will be charged for each 
day the book is kept overtime. 







Better Teaching Through Testing 




Better Teaching 
Through Testing 

A PRACTICAL MANUAL FOR THE 
PHYSICAL EDUCATION TEACHER 

M. Gladys Scott, Ph.D. 

Associate Professor of Physical Education, State University of Iowa 

Esther French, Ph.D. 

Professor of Physical Education, Head Department of Physical 
Education for Women, Illinois State Normal University 



A. S. BARNES AND COMPANY , NEW YORK 


Copyright, 1945, by 

A. S. BARNES AND COMPANY, INCORPORATED 


This book is fully protected by copyright and nothing that 
appears in it may be reprinted or reproduced in any man- 
ner, either wholly or in part, for any use whatever, with- 
out special written permission of the copyright owner. 


This book has been manufactured in 
accordance with the regulations of 
the War Production Board. 


PRINTED IN THE UNITED STATES OF AMERICA 



Preface 


Physical Education has reached the stage where 
individual diagnosis is recognized as an essential step in the teach- 
ing process. The mere recognition of the importance of testing, 
an integral part of individual diagnosis, does not guarantee the 
translation of the idea into practice. All too often tests have either 
not been available or were too long and complicated, too extrava- 
gant of class time, or had no proven value. A simplified presenta- 
tion should be helpful for the teacher who has not specialized 
in testing procedures. 

The problem of testing has been accentuated by war time con- 
ditions: by the necessity of showing results, and for obtaining them 
quickly, by the necessity of mass testing, by requests for measures 
of qualities other than specific skills, by the need for standards 
of performance, by attempts to interpret the significance of in- 
adequate or superior abilities. Such necessities have accelerated 
the rate in experimentation and use of tests, and have promoted 
an analytical attitude toward testing problems. 

It is the purpose of this book to give a non-technical discussion 
of the testing procedure. It is designed as a practical manual to 
be used by the teacher in planning various units. The professional 
student should find it useful as a text, cut down to essentials, and 
aimed at giving a clear perspective on testing as a part of the 
teaching procedure. Our thesis is that testing must be considered 
and used in relation to aims and objectives and that its value is 
dependent upon the benefit derived by the individual pupil. 

All the tests included here have been validated on some group, 
have been used with other groups, and are known to be reasonably 
satisfactory. The majority are of recent origin and many have not 
appeared elsewhere in the literature. No claim is made that they 
are the final answer. New measures are constantly being con- 
structed, some of which will undoubtedly be improvements. How- 



vi Better Teaching Through Testing 

ever, these are practical and have proven value, and the student 
or teacher can use these suggestions as a basis for evaluating new 
tests as they appear. 

This book aims at giving a background for test construction, 
selection and use. This background is practically identical for 
those conducting physical education programs for either boys or 
girls. A few tests described herein are for girls only. Many others 
have not been adequately tried with boys’ groups, but might be 
found useful with minor adaptations. Few tests are given specifi- 
cally for boys. However, the background provided here should be 
helpful in working with tests from other sources. 

Acknowledgments are made to our many students who have 
served as subjects and often experimented with us on different pro- 
cedures. Special recognition is also given to our graduate students 
who have taken the initiative in developing certain tests. Refer- 
ences are made to their studies throughout the book. They include 
the following: Jean Bontz, Ruth Buchanan, Marian Conelly, 
Bernice Cooper, Dorothy Mohr, Elizabeth Salit, Evelyn Shaufele, 
Margaret Schmithals, Jeannette Smalley, Wilma Kerr Smith, 
Evelyn Sturtz, Audrey Underkofler, and Marjorie Wilson. Appre- 
ciation is also extended to Marjorie Wilson for her aid in statistical 
work. 

We are also indebted to the publishers of various articles and 
books for their permission to use some of their material. Recogni- 
tion is given in the text for such source materials, but acknowledg- 
ments are hereby made to the following publishers: A. S. Barnes 
and Company; F. S. Crofts and Company; Public School Publish- 
ing Company; Research Quarterly of the American Association for 
Health, Physical Education and Recreation; and the Visual Edu- 
cation Department, University of Iowa. 

M. G. S. 

E. F. 



Contents 


CHAJPTXR FACE 

I USES OF TESTS i 

Introduction i 

Motivation 1 

Status of Students 2 

Diagnosis 3 

Evidence of Results 4 

Information and Understanding 5 

Testing as a Phase of Teaching 6 

Summary 8 

II CHARACTERISTICS OF A GOOD TEST 11 

Measure Important Abilities 11 

Be Like Game Situation 13 

Involve Only One Performer 14 

Encourage Good Form 15 

Provide Accurate Scoring 16 

Provide Sufficient Number of Trials 16 

Be Interesting and Meaningful 17 

Adjusted to Ability of the Group 18 

Judged by Criteria Applied Statistically 19 

Provide a Means of Interpreting Performance 26 

III ADMINISTRATION OF TESTS 30 

Class Organization 30 

Use of Space 31 

Training the Assistants 31 

Devices for Facilitating Measurement 33 

Score Cards 38 

Presentation of Tests 42 

Knowledge Tests 44 

IV MEASUREMENT OF SKILL IN SPORTS 48 

Introduction 48 

Basis for Selection 48 

Standards or Achievement Scales 49 

Badminton 49 

Basketball 56 

Field Hockey 59 

Soccer 71 

Softball 84 

Speedball 90 

Tennis too 

Volley Ball 10 1 

V EVALUATION OF PHYSICAL FITNESS 109 

The Need for Fitness Tests 109 

Selection of Fitness Tests 112 

Tests of Strength 119 

Tests of Endurance 124 



viii Better Teaching Through Testing 

Tests of Agility 128 

Tests of Flexibility 128 

VI MEASUREMENT OF GENERAL MOTOR ABILITY 134 
Motor Ability Defined 134 

A Motor Ability Battery 136 

Scoring the Battery 141 

Evaluating the Battery 148 

Substitutions in the Battery 149 

VII ACHIEVEMENT RATINGS AND PROGRESSIONS 154 
Ratings 1 54 

Diving 156 

Posture 158 

Dance 166 

Softball Batting 167 

Basketball 169 

Tennis 170 

Achievement Progressions 172 

Swimming 1 74 

Stunts and Tumbling 174 

Bowling 176 

Archery 1 76 

VIII CONSTRUCTION OF KNOWLEDGE EXAMINA- 
TIONS 179 

Availability 179 

Uses 179 

Sources of Information 180 

Distribution of Content 180 

Choice of Type of Item 181 

Use of Miscellaneous Devices 191 

Check List for Evaluating Items 1 96 

Planning the Total Examination 198 

Administering the Examination 200 

How to Check on the Effectiveness of Examinations 200 
Revising Examinations 201 

Tests as Teaching Devices 205 

IX SIMPLE STATISTICAL PROCEDURE 210 

Frequency Distribution 210 

Converting Raw Scores into Letter Grades 2 14 

Averaging Letter Grades 217 

Measures of Central Tendency 217 

Measures of Distribution 220 

Conversion of Raw Scores into Comparable Values 222 
Evaluation of Knowledge Test Questions 227 

Establishment of Criterion Scores through Conduct of 

Ratings 232 

Procedure for Constructing a Motor Test Battery 233 



I. 

Use of Tests 


Physical education, in common with all branches 
of education, has certain aims and objectives. A list of objectives 
for pupil development through physical education usually includes 
something like the following: health or physical fitness, motor 
skill, knowledge or information, and social adjustment. If such 
concepts are definite enough to be set up as teaching goals, if acqui- 
sition of learning takes place, then the results must be recognizable 
and more or less precise means of evaluation must be possible. It 
is obvious that there is variation in the ease with which the aspects 
of these different objectives may be measured. It is the purpose of 
this book to provide practical information on the areas which may 
be measured most satisfactorily; that means tests for many of the 
motor skills, for knowledge, and for some phases of fitness not cov- 
ered in the medical examination. 

MOTIVATION 

Probably the greatest source of satisfaction in any 
learning situation is the feeling of accomplishment. Both the 
teacher and the student share in this sense of achievement. This is 
one of the reasons that activities such as swimming or stunts gain 
almost immediate enthusiasm; results are apparent at once. Any 
activity can be taught so as to secure these same obvious results 
if practice and self-testing devices are used. For example, does a 
player use a shot from specified points around the basket with in- 
creasing accuracy? Is the softball thrown from base to base with 
greater accuracy? Is a player’s tennis stroke improved so that the 
number of continuous returns made by the player and an opponent 
increases regularly? Such systems of more or less objective evidence 
may be a part of regular practice and serve as a real source of 
motivation. 



2 Better Teaching Through Testing 

The occasional student who lacks that natural interest in self- 
testing of skills will be motivated by the knowledge that tests will 
be given and that the teacher and other students will judge him 
by the skill he demonstrates in these tests. 

STATUS OF STUDENTS 

There are certain times during the learning period 
when it is desirable to know the relative status of the students on 
acquired skills or innate capacities which are pertinent to learning 
of that activity. At the beginning of a sport season it may be 
helpful in sectioning classes or selecting the personnel of squads 
within a class. Most classes can not be made up with individuals 
of highly homogeneous ability; but teaching and competition are 
more satisfactory when the personnel of the teams or squads has 
been determined by objective measures. Such teams may be 
matched for ability, or poorer students separated for special help 
according to the plan of the teacher. 

Again at the end of a unit it may be helpful to know the status 
of students because they usually are interested in their own im- 
provement. Likewise, in most schools the teacher is expected to give 
each student a grade. Grading is a process of rewarding a student 
according to his learning relative to the learning of the other 
students and relative to a subjective standard established by the 
teacher. Since the standards for the extremes of the grading scale 
must remain more or less subjective, objective measures help con- 
siderably in differentiating within this range. They also serve as 
a check on the justifiability of the teacher’s expectations for a 
particular group of students, or for individuals in terms of capacity. 

Further evidence of the level of individual or group performance 
may be secured by making comparisons with performance of stu- 
dents in other schools. Such comparison can be obtained best by 
objective tests. This purpose is claimed for interscholastic com- 
petition in sports, but such competition usually affects only the 
very few at the top end of the scale. Of course, there are advantages 
and disadvantages both in competitive sports and in standardized 
tests and achievement scales. However, if the whole group is to 
be considered, either by way of determining level of ability or 



Use of Tests 3 

stimulation to greater effort, then the testing method is preferable. 

If the teacher has the same students in similar activities on 
successive years, economy in testing time may be achieved by care- 
ful selection of tests and by saving the scores. Thus the scores ob- 
tained in determining ability at the end of the season may be used 
at the beginning of the corresponding unit the following year. 
This will be accurate in classifying students the second season pro- 
viding all have had equal experience in similar activities during 
the interval. It will be adequate for relative status. Probably such 
a procedure will be satisfactory as a basis for measuring the amount 
of improvement during the second unit. However, smaller im- 
provement may be expected from all students. This is due to the 
fact that students may be “out of practice” on initial tests and, 
therefore, score slightly less than they did at the end of the pre- 
ceding season. 

DIAGNOSIS 


The purpose of diagnostic tests is to single out skills 
which need special attention during instruction. If a choice is to 
be made in materials taught, test results may furnish evidence for 
that choice. Whenever there is opportunity for any individualized 
instruction and practice the diagnostic test is a prerequisite. 

The diagnostic test becomes a practice test if the student repeats 
it regularly with the definite aim of attempting to raise the score. 
If individuals with similar needs are put into one squad then the 
squad may work together. Whether working singly or in a squad 
each person may be practicing on the skill in which he has demon- 
strated the greatest weakness. 

Most tests for measuring status or achievement of the student 
or for motivation sample a few of the skills used in the game. The 
diagnostic battery usually includes a fairly comprehensive list of 
skills, in order to be of the greatest help to the teacher in planning 
the course and of greatest help to all students. On the college level 
the diagnostic test is used very extensively in advising students, 
determining their requirements and the scope of their electives. 



4 


Better Teaching Through Testing 


EVIDENCE OF RESULTS 

There are times when every teacher is challenged to 
show evidence of results of teaching. In some cases that challenge 
is from a principal when his cooperation is asked for physical 
education. In other cases it may be parents or their representatives, 
the school board. During the years of educational reorganization 
and planning following the war this attitude will be more prevalent 
than ever before. Everyone is asking, “What can physical education 
do for the fitness and for the education of the students?’’ They will 
be asking for information and asking with an open mind, ready to 
acknowledge merits in a good program. 

An opportunity has been handed to the physical education pro- 
fession by war conditions and augmented by the attitude of those 
administering the training programs of the armed services for 
both men and women. All branches of service have devoted an 
unprecedented amount of time to conditioning, have given more 
care than ever before to planning of those programs, have made 
greater efforts to encourage physically active types of recreation, 
and most branches have attempted to place such programs in the 
hands of personnel trained in physical education. All branches of 
military service have made extensive use of tests to determine 
aptitudes and capacities, measure learning, determine stamina, and 
in general to promote or reject individuals for specialized service. 
These factors will contribute to a more favorable attitude toward 
objective tests and broad physical education programs. 

Teachers may be asked by students to show evidence of results. 
The poor student may say, “I haven’t learned anything, I might 
as well quit trying.” The best answer to that is a test sufficiently 
easy that it will register performance, discriminating enough to 
indicate small increments of ability, and on a scale which seems 
to reward slight effort. Then by repeated use of this test the re- 
sults become apparent. It has been shown that college students 
at the low end of the motor ability scale show great interest in 
their ability and derive considerable satisfaction from modest 
amounts of improvement when testing procedures are used in 
connection with instructions . 12 For example, they may be learning 

12 See reference in bibliography at end of this chapter. 



Use of Tests 5 

the principles of effective jumping while using jumping in games 
such as basketball and volley ball. The tests which are used from 
time to time may be the vertical jump, or jump and reach, or a 
broad jump. These are tests which may be objective, are adminis- 
tered easily, and afford an opportunity for the student to apply 
the principles which have been learned. 

Likewise, better students may unconsciously express a need for 
substantial evidence of their ability when they say, “Oh, I can do 
that, I don’t need to practice.” For such students test scores are 
usually surprising. They fall below the standards which they them- 
selves expect, or below scores achieved by other members of their 
group whom they would like to surpass, or below the standards 
shown in achievement scales. Many students are automatically 
stimulated by such comparisons. 

In some cases the teacher himself may debate the merits of differ- 
ent methods or parts of the program, or doubt the adequacy of 
results. Objective evidence through tests is the most satisfactory 
answer to this wholesome criticism by the teacher. 

INFORMATION AND UNDERSTANDING AS 
A PART OF PHYSICAL EDUCATION 

Most laymen and some teachers consider physical 
education to be a matter of physical activity, perhaps developing 
motor skills and contributing certain physiological results. Factual 
content, understanding and appreciation, attitudes, and general 
intellectual growth are not always recognized as an important part 
of the process. When these are accepted as desirable outcomes and 
planned for in teaching, they must be tested. Since these are objec- 
tives common to all forms of education, ether educators have 
pointed the way to testing. This is done principally by written, 
objective examinations. The success depends upon the skill with 
which the teacher constructs the examination. The value for the 
student may be the same as for the skill tests. 



6 Better Teaching Through Testing 

TESTING AS A PHASE OF TEACHING 

Tests may be chosen for a given purpose but what is 
to insure their effectiveness? The tests themselves, though inher- 
ently good and administered according to standard instructions, 
will not guarantee effectiveness. The guarantee lies principally in 
observing the same rules that constitute good teaching. The good 
teacher presents his activities with enthusiasm, confidence, and faith 
in what he is doing. The students will respond with enthusiasm and 
interest. The whole atmosphere and all results are in contrast to that 
created by a teacher who teaches an activity because it is easier than 
some other one he might choose, or because he knows that many 
other teachers teach it and assumes, therefore, that it must be accept- 
able. The same situation exists in lessons which include testing. The 
tests which are used will seem important and interesting to the 
students if the teacher understands and appreciates their value 
and demonstrates confidence in them. 

Every test which is given should serve the student in some way 
just as every activity included in the program should be planned 
with the student’s welfare uppermost in mind. If the student is to 
receive greatest benefit and really understand the test, his score 
must always be made known to him. Not only should the student 
be told results, but he should be informed as soon as possible. If 
too much time elapses he loses interest in that particular test and 
may acquire an attitude of indifference for future tests. Also, if the 
student is to plan future work on the basis of test results no time 
should be lost. Information concerning results also implies enough 
interpretation of those results that he can really appreciate his own 
status, plan intelligently, and recognize accomplishment when it 
does occur. 

Following up individual needs calls for carefully planned teach- 
ing. Tests serve as one of the best sources of outlining needs and 
should be recognized thus by the student. This is an effective means 
of motivating an interest in learning activities. Effort is greater 
when the test is accepted as a personal aid rather than something 
for the teacher’s benefit. Testing becomes important for the indi- 
vidual; he sees a means of learning faster, and therefore, the 
whole physical education program appears more significant. 



Use of Tests 7 

Test results should be studied carefully and recognition given 
for good performance or improvement. Likewise, encouragement 
can be given the student who is known to have worked hard but 
who registered only moderate improvement on the test. If atten- 
tion is concentrated only on the cases which show great gains many 
students fail to derive the benefits they should from the use of the 
test. It must be kept in mind that improvement at the upper limit 
of the scale is much slower than at a lower level. The superior 
performer usually secures his share of instruction in preparation 
for competitive activities but seldom does he secure adequate 
attention and motivation in the regular physical education class. 

Wherever any plan of individual selection of activity is in opera- 
tion some form of counseling or guidance is essential. Such guid- 
ance can depend to a considerable extent on test scores. The stu- 
dent can be guided into the activity where instruction will be at his 
level, and where weaknesses can be overcome. On successive occa- 
sions progress can be charted and encouragement or stimulation 
given. If guidance extends to related recreational choices he can 
also be led into activities where he is most apt to be successful and 
happy. By such planning and the relating of testing to practice and 
playing the student sees the relationship between testing, learning, 
and his ultimate goal of enjoying activities, or playing on a team. 
Colleges have used this system extensively not only with major 
students but in planning for all college students. Similar plans 
can be used with high school students even though no choice can 
be given the student of section or activity. Choices may be offered 
within the hour, squads may work on different skills all related to 
the same activity, different self-testing or practice tests may be used 
for different squads, and always the student may be given oppor- 
tunity to measure achievement. 

Tests which put an emphasis on individual improvement almost 
always appeal to the older children in the elementary school and 
junior high school. For that age tests serve as a form of competition 
and an opportunity for individual recognition. Therefore, the 
tests should be varied enough that the same students will not 
always be in the upper brackets. In order to make use of this com- 
petitive urge to the fullest, recognition should be made to the 
whole group of the individuals with especially high scores, or those 



8 Better Teaching Through Testing 

making greatest improvement, or the squads with the best score. 
The use of achievement scores for comparison with other schools 
or similar groups adds zest and competitive spirit to the testing. 
With older groups this is not as potent a factor but it does continue 
to operate in some cases. 

The teacher must consider the proportion of time which can be 
devoted to testing just as he must consider the proportion to be 
devoted to each unit; the proportion between drill and actual par- 
ticipation in the activity; the proportion between discussion, 
verbal description or moving pictures and actual practice. In 
every case there are advantages to be derived, but all objectives 
must be kept in mind and the relative merits of different approaches 
or proportions weighed. Likewise, testing can provide some real 
values, but while a greater time allotment may permit a more 
extensive testing program it probably will not stay in the same 
proportion as in a smaller time budget. There may even be situa- 
tions in which the time allotment is so brief that testing may be 
reduced to a very low point. The most important rule to follow 
is that every test should serve the student. If no time remains after 
the testing is done, no follow-up can be made on it. If too little 
testing is done, inadequate information is available for teaching. 

Testing is a device for teaching and learning. The successful 
teacher is the one who knows the best devices and aids to learning 
and who uses them skillfully. Under the direction of such a teacher 
tests fit in so smoothly and naturally that they are accepted by the 
students, are used by the student and teacher, are understood by 
the other teachers, are appreciated by the parents, and actually 
become inseparable from the teaching-learning process. 

SUMMARY 


The above discussion has classified tests according 
to general use in relation to teaching. Usually tests have been 
classified according to specific purpose or ability concerned. Some- 
times this classification has been rather remote from the teacher’s 
problem of improving instruction, and often confusing because it 
seemed to call for too wide a variety of tests. It is possible for a 
single test to serve more than one purpose. The main values to 



Use of Tests 9 

be derived from tests, as discussed above, may be achieved through 
different types of tests. The types named here are from the popular 
classification. 

Motivation. If tests are to be used to increase interest and effort 
the teacher may select achievement tests of general or specific 
ability, written tests, diagnostic tests, or practice tests. 

Status of Students. If the teacher needs to know the relative 
status of students or the ability of a class, he may select from the 
so-called achievement tests and use an achievement scale , the classi- 
fication devices, the motor ability, motor capacity, motor educa- 
bility, or written tests. Also if he is concerned with some special 
ability he may use the cardiac functional , strength, endurance, 
physical capacity, or orthopedic test as the case may require. The 
information on status enables the teacher to organize class work, 
give individual help and guidance, determine pupil progress, and 
assign grades. 

Diagnosis. For diagnostic purposes the teacher may use any of 
the types listed under status; for status he is most apt to be con- 
cerned with a single type. In diagnosis, scope and comparison are 
essential. Therefore, he needs several forms and a scale on identi- 
cal units where direct comparison is possible. 

Evidence of Results. Again the selection of tests may be from 
almost any type. The determining factor will be the objectives 
which the teacher emphasizes and the information which is desired. 

For successful use of tests the teacher must start with a clear 
understanding of his objectives, evaluate and select tests carefully, 
administer accurately, and follow up the scores for greatest benefit 
to the student. 


BIBLIOGRAPHY 

1. Bovard, John F., and Cozens, Frederick W.: Tests and Measurements in Physi- 
cal Education. W. B. Saunders Company, Philadelphia, 1938, Chapter I. 

*• Carpenter, Aileen: "The Future of Tests and Measurements in the Elementary 
Schools," Journal of Health and Physical Education, 15, November 1944, p. 479 



io Better Teaching Through Testing 

3. Cassidy, Rosalind, and Kozman, Hilda C.: Physical Fitness for Girls. A. S. 
Barnes & Company, New York, 1943, Chapter 10 

4. Howland, Amy; “Contributions of the Achievement Tests to the Athletic Pro- 
gram,” Journal of Health and Physical Education, 10, April 1939, p. 214 

5. Kistler, Joy W.: "A Study of the Results of Eight Weeks of Participation in a 
University Physical Fitness Program for Men,” Research Quarterly, 15, March 
1944, p. 23 

6. Krakower, Hyman: "Testing— Overdone or Underdone?” Journal of Health 
and Physical Education, 13, May 1942, p. 285 

7. Larson, Leonard A., and Cox, Walter A.: “Tests and Measurement in Health 
and Physical Education,” Research Quarterly, 12, Supplement, May 1941, p. 483 

8. Lee, Mabel: The Conduct of Physical Education. A. S. Barnes & Company, 
New York, 1937, Chapters 9 and 15 

9. McCloy, C. H.: “Intelligent Use of Tests," Journal of Health and Physical Edu- 
cation, 8, May 1937, p. 297 

10. : Tests and Measurements in Health and Physical Education, Second Edi- 

tion. F. S. Crofts and Company, New York, 1942, Chapter 24 

11. Reichart, Natalie: "School Archery Standards,” Journal of Health and Physical 
Education, 14, February 1943, p. 81 

12. Salit, Elizabeth Powell: “The Development of Fundamental Sport Skills in 
Freshman College Women of Low Motor Ability,” Research Quarterly, 15, De- 
cember, 1944, p. 330. 

13. Segel, David: "An Appraisal of the Influences of World War II on Testing 
Practices,” Education for Victory, 2, April 20, 1944, p. 3 

14. Scott, M. Gladys: “The Use of Skill Tests,” Journal of Health and Physical 
Education, g, June 1938, p. 364 

15. Spindler, Evelyn: “Do You Grade or Guess," Journal of Health and Physical 
Education, 2, October 1931, p. g6 

16. Wilbur, Ernest A.: "A Comparative Study of Physical Fitness Indices as Meas- 
ured by Two Programs of Physical Education: The Sports Method and the 
Apparatus Method,” Research Quarterly, 14, October 1943, p. 326 



2 


Characteristics of a Good Test 


Fortunately in some areas there is a possibility of 
choice between tests. The question which then presents itself is, 
“What are the inherent qualities of a test which make it useful 
or really good?” 

The criteria most frequently considered pertinent on this point 
are more or less uniform. Usually evidence on reliability and 
validity is presented. Objectivity is commonly assumed in con- 
struction of the test and some consideration is sometimes given to 
economy of time in administration. However, most of this evi- 
dence is statistical in nature and requires knowledge both of 
statistics and of tests in order that the reader be fully cognizant of 
the implications. 

There are criteria other than reliability and validity by which 
the teacher may study prospective tests. These supplement and 
explain the first type of criteria and make application of principles 
well known to teaching. They will be discussed first. 

TESTS SHOULD MEASURE IMPORTANT 
ABILITIES 


The ability required in any test should be the same 
as that required in the more general performance which is being 
tested. This point seems very obvious and yet it is sometimes 
ignored in the concentration on statistical treatment. It is more 
clearly explained by some examples from sports. A sports test is 
usually given to measure general ability in that sport or specific 
ability in one technique necessary in that sport. The significance 
of a specific ability for success in playing that game is dependent 
upon its relative importance with all techniques required in the 
game. 

For example, the mistake is sometimes made of using a test 



i« Better Teaching Through Testing 

known as the throw-in as a measure of soccer ability. This is largely 
because “statistical evidence” has been presented to indicate that 
it does measure soccer ability. Truly enough, the throw-in will 
correlate rather highly with general ability in soccer. It would be 
equally true with almost all sports. The reason is that throwing 
is a general basic skill, and throwing tests are an important part 
of any general ability battery. If one studies the game of soccer, it 
is evident that a throw of this type is used only occasionally by the 
goalkeeper and by the right and left halfbacks or fullbacks when 
the ball goes out of bounds. Other than that the players are not 
allowed to use their hands on the ball. It should be apparent then 
that for purposes for which sport tests are used that logical as well 
as statistical considerations should govern the selection. A very 
similar limitation is found in hockey on the use of the roll-in, yet 
the roll-in is frequently found in hockey test batteries. 

Sometimes an almost identical situation is found in basketball 
testing. Some test batteries use the throw for distance as a measure 
of basketball ability. Again it is an excellent measure of general 
ability. Since players with high general ability make better basket- 
ball players than those with low ability, it is obvious that high cor- 
relations can be obtained. However, a good basketball team sel- 
dom makes use of long passes so it hardly seems desirable to tell 
the players that their ability as basketball players will be rated by 
a distance throw. 

In selecting tests from various sources the teacher should follow 
the same procedure that must be used when creating test batteries. 
That procedure is to outline the skills used and then devise tests 
each of which uses one or more of those skills. 

Two of the sports mentioned above will be used to illustrate 
this step. Soccer playing requires primarily (1) ability to play the 
ball and (2) ability of the player to move quickly with good weight 
control. Ability to play the ball can be subdivided into passing, 
dribbling, goal shooting, blocking, or trapping. In almost every 
case this means using the feet for that purpose. Ability to move 
quickly can be subdivided into running: running with the ball, 
i.e,, dribbling; change of direction; and footwork for kicking, 

levntmg and dodging. As one further studies these abilities it is 
apparent that passes are rather short, with the exception of place 



Characteristics of a Good Test 13 

kicks which are sometimes long; that passing is usually done imme- 
diately after receiving the ball, and while the player is moving; 
that goal kicking is almost invariably on a moving ball at some 
distance from the goal, with more or less interference between the 
kicker and the goal. Likewise, weight control is inseparable from 
ball handling, running must be fast but for short distances with 
quick changes of direction. When this type of analysis is made, the 
teacher would have little reason to select the throw-in, or the place 
kick at goal, or the straight dash or straight dribble. 

A similar analysis could be made of basketball. That would 
make apparent the fact that passes are usually short, and made 
quickly after receiving the ball; that shooting is usually from 
rather close range, and almost always must be done while moving; 
that running is rapid but changes of direction are frequent and 
abrupt. A careful analysis of the game would doubtless be followed 
by rejection of such tests as the throw for distance, or standing and 
throwing at a stationary target on the wall, or free throw shooting. 

TESTS SHOULD BE LIKE GAME SITUATIONS 

The tests should be as nearly like the game situation 
as possible. A service stroke is always taken from a stationary posi- 
tion and usually with no great pressure of time. Therefore, a test 
of serving ability may be set up very successfully as successive trials 
from a given spot at a given target. However, the later strokes or 
returns in the game are made on an approaching ball, which may 
require considerable footwork and must be timed with the ap- 
proach of the ball. It is seldom that such a play can be slow and 
deliberate. Play is continuous and the test should be continuous. 

One fallacy too frequently put into practice, and also into print, 
is that when skills are combined in a test, it makes the test game- 
like. That assumption results in such basketball tests as a pivot and 
shot, or a bounce and shot, or soccer tests such as the dribble and 
goal kick. Each is scored by the percentage of successful goals. This 
ignores the element of speed which is almost always present, and 

the possibility of poor form or actual infringement of rules which 
may take place. The latter is especially frequent in the basketball 
bounce and pass test. Even if violations are considered it means 



14 Better Teaching Through Testing 

an extra trained helper for each player taking the test and the 
decision of that helper is subjective. If test situations can be set up 
where the player makes his own footwork fit the situation and con- 
tinues the play the results are usually more satisfactory. 

The time element is much more important than is usually real- 
ized unless a systematic study of the problem has been made. Let us 
consider an example from softball. Successive throws from a 
standing position at a stationary target proved very poor. A ball 
sent from a catapult, caught, and thrown immediately at a target 
proved somewhat better. However, that offered two difficulties. 
First, the use of a catapult is impractical. Second, it was still pos- 
sible for the player to hesitate long enough after the catch that the 
situation was almost the same as in the stationary throw. The next 
step in the development of this test was to change the sequence 
and add a time control. The player then threw the ball at the wall, 
fielded it as though playing a baseman’s position, and threw to a 
target which represented another baseman. Time was counted 
from the start of the ball on the first throw to the hit of the second 
throw. Accuracy of the second throw was recorded. The actual 
time required for those throws proved relatively unimportant 
except in the extremes, but it brought the throwing into the same 
timing as experienced in the game. Therefore, performance on the 
test corresponds much more closely with actual performance in 
the game. 

Timing in the above test is one of the few examples where 
records are taken and no very definite follow-up made of results. 
However, it has served its purpose of making the test psychologi- 
cally game-like. The difference in the two forms of the test is 
comparable to the difference between pegging the ball around the 
bases for practice and trying to beat the runner to base for an out. 

TESTS SHOULD INVOLVE ONLY ONE PERFORMER 

The above discussion has doubtless suggested the 
use of two or more players in a test situation since the player in 
the game must always consider the person from whom he receives 
the ball, to whom he passes, and very frequently his opponent’s 
actions. Such an arrangement would satisfy the standard of game 



Characteristics of a Good Test 15 

similarity. However, no one standard can be considered in isola- 
tion from the rest. The cooperation or competition for players A 
to Z must be identical when being tested. It is perfectly obvious 
that they must all have good balls, the same size target and other 
equipment. It is equally obvious that one player should not be 
tested with a partner who fumbles and passes wildly while another 
has an excellent player. 

The test should involve only the one person being tested. This 
accounts for the very frequent use of repeated throws or volleys 
against a wall. The objection is that this type of test presents an 
artificial situation. This is true. However, it represents a compro- 
mise between the two criteria of game similarity and a single per- 
former. A repeated volley test does aid in a way in producing game 
similarity in that the player both receives and passes the ball, and 
play is continuous. Moreover, the player alone is responsible for 
the results. 


TESTS SHOULD ENCOURAGE GOOD FORM 

Another problem of real importance in some sports 
is that of measurement relative to form. For example, a tennis 
player may be able to place a ball in a specified area on the court 
but the flight of the ball may be very slow, arched and followed by 
a high bounce; the return of such a ball would be extremely simple. 
Another may send the ball in the same area but it travels with 
speed, in a flat path with little bounce. Some tests do little to dis- 
tinguish between these two players. 

The best solution to this difficulty would appear to be either the 
introduction of the time element or a subjective rating of form 
to supplement the test score. The first approach is used in a test 
such as the Dyer backboard test (see p. 100). However, this will 
ignores form and as usually given does little to eliminate that 
objectionable characteristic. That can be partially overcome by 
moving the restraining line back to 25 feet (perhaps with the 
privilege of stepping across for a single stroke to be followed by 
another from behind the line). That prevents a player from stand- 
ing near the wall, volleying the ball, and receiving a high score 
because the ball travels only a short distance. When everyone plays 



16 Better Teaching Through Testing 

from a more nearly uniform position, the player with power and 
control has opportunity to score above inferior players. 

TESTS SHOULD PROVIDE ACCURATE SCORING 

The objectivity of a test depends upon the certainty 
with which the trial can be scored a success or failure, or for a given 
value on a target, stopwatch or measuring tape. A basket with a 
net leaves little doubt as to whether the ball went through or 
dropped outside. A badminton bird which is to be scored as going 
above or below a rope is more difficult to see. A ball thrown at 
high speed at a target is difficult to judge as inside or outside a 
line unless special equipment is used for that purpose. 

A simple form of target to construct for accuracy throwing is 
made of wood. The center of the target, and alternate circles out 
from the center, are made of tin pieces attached to the wooden 
background. The sound of hits in two adjacent areas makes it pos- 
sible to definitely judge the accuracy of the hit. 

Distance covered by a basketball bounce and the legality of a 
play in terms of traveling are difficult to judge. When a race is 
judged by the zone in which the runner finished, errors in judg- 
ment also creep in. The faster the runner, the greater the chance of 
error. If given indoors on a course adjacent to a wall, the numbers 
can be placed on the wall instead of the floor. 

TESTS SHOULD PROVIDE A SUFFICIENT 
NUMBER OF TRIALS 

Trials should be sufficient to eliminate chance devia- 
tions from a truly representative score. One trial may be sufficient 
under optimum conditions. For example, a race is usually set up 
with uniform conditions for all. Assuming a good timer and proper 
motivation, an accurate measure of maximum effort can be ob- 
tained from a single run. Ability on short runs will not vary much 
from day to day, ability on longer endurance runs might vary 
more, but they are usually used for a different purpose. 

The number of trials necessary for a given test can be deter- 
mined only by experimentation. Some general rules can be stated, 
however. Most tests of maximum effort can be measured best by 



Characteristics of a Good Test 17 

one to three trials. This would include items such as a dash, throw 
for distance, strength events, and speed events where control is 
relatively unimportant. When a high degree of accuracy is neces- 
sary the number of trials required goes up. It may run then from 
five to thirty, usually between ten and twenty. Frequently the 
number of trials necessary for a group of advanced players will be 
fewer than for less experienced players on the same test. 

If the test is being used for a practice device or for motivation, 
the trials may be reduced more than if the purpose is classification 
or grading. For example, it may be desirable to have some players 
practice on free throws. As a practice device five to ten may be the 
maximum number of throws that is practical; and in this form 
a test may be valuable as a motivator as well as providing practice. 
However, it is known that this is an insufficient number for re- 
liable estimates of a player’s ability. 

TESTS SHOULD BE INTERESTING AND 
MEANINGFUL 

The test should appeal to the students if best efforts 
are to be obtained. This is partially a problem of administration. 
The means of motivation are varied: using individual score cards 
with continuous records; posting scores and names of the best 
performers, or of those who make greatest improvement; promo- 
tion to better teams or squads when sufficient improvement is 
made; or comparison of scores with accepted standards. However, 
there are certain inherent qualities which attract or repel student 
interest. Game similarity is the first step toward a favorable atti- 
tude; the test then has meaning. 

When a student knows how well he is doing on the test, or at 
least what the score is when the test is finished, he is more inter- 
ested. Probably, one reason for the popularity of the basket shoot- 
ing tests is the fact that the student can see immediately how each 
attempt is scored and each successive shot presents a new chance 
under identical conditions with the first. 

Different types of scoring carry different meaning to the stu- 
dents. For example, some tests may be scored either as the number 
of hits in a given time interval or as the time required to make a 



18 Better Teaching Through Testing 

specified number of hits. A throw or hit is perfectly understand- 
able to the student. However, when results are expressed in time 
intervals, number of seconds, it is a rather vague concept. It is 
true that the student can make comparisons with others even if he 
does not fully understand the score, but the standard of an opti- 
mum or ideal performance is lacking. 

Test should not be so time consuming that the student asso- 
ciates them with slow moving class periods when he is deprived of 
the opportunity to play the game. The tests should fit into the 
time that is ordinarily devoted to practice on techniques. 

Most sports tests do not produce any undesirable after-effect. 
However, some types of tests which are done for endurance may 
result in severe muscular soreness. Few girls delight in that sore- 
ness or even take a stoical attitude toward it. The result is a dis- 
like for the test which produced it, or perhaps for all tests, because 
they do not understand the causes of the soreness or the types of 
tests which produce soreness. Boys are less apt to be affected by 
this factor. 

DIFFICULTY OF THE TEST SHOULD BE 
ADJUSTED TO THE ABILITY OF THE GROUP 

A test may be easy or difficult according to the way in 
which it is done. Tests may need to be modified to meet the ability of 
the group being tested. It is not recommended that tests be re-made 
freely since one can no longer assume constancy of desirable 
characteristics in the altered form. However, strength tests may 
be made easier by changing the kind of support, runs may be 
shortened to the point where students will keep trying through- 
out, and throwing distance may be shortened so that all can throw 
as far as the target. 

The rule to be applied here is that scores should show a reason- 
able distribution. This means there should be no massing of scores 
at any one point. If the test is too easy, most of the scores will be 
at the maximum. If the test is too difficult most of the scores will 
be zero or near it. In such cases the form of the test may be changed. 
However, it is better either to postpone the test until the class 
has developed the necessary ability, or to select another test. 



Characteristics of a Good Test 


19 


TESTS SHOULD BE JUDGED BY CRITERIA WHICH 
CAN BE APPLIED STATISTICALLY 

How do all of the above considerations relate to the 
usual textbook list of criteria, namely: validity, reliability, objec- 
tivity, economy? Reliability, or consistency of measurement of the 
same degree of ability, is possible only if the performer is interested 
enough to give maximum effort, if the trials are sufficient in 
number to eliminate chance factors, if his own skill only is meas- 
ured, if equipment and test conditions are uniform, and if measur- 
ing units are objective. 

Validity is defined as the degree to which a test measures what it 
is supposed to measure. As stated before, logic as well as statistics 
should be applied in determining that characteristic. 

Economy results primarily from careful selection of a few highly 
valid tests which are administered efficiently (see Chapter 3). 

Norms or standards of achievement undoubtedly add to the 
usefulness of the test. But that is not an insurmountable obstacle. 
If achievement scales are not available, a substitute scale may be 
constructed on a given class or school group. This is most easily 
done by the T-scale (see p. 222). In some cases this may be prefer- 
able to using published scales since the standards are then in terms 
of that particular group. 

If one is to make use of the work of others on test evaluation 
certain statistical concepts are essential. Frequency distributions, 
measures of central tendency and of dispersion, product moment 
correlations and regression equations are formidable sounding 
terms. A speaking acquaintance with the terms as a handle for the 
concepts they represent is adequate for evaluation of testing 
reports. 

Basic to most of these concepts is that of the range of ability or 
distribution of scores. In any large group, unselected for the ability 
in question, there will be considerable variation in the level of 
that ability. If the measuring device is sound the range of ability 
will be shown. However, there will not be an equal number of 
individuals at all points along the range. In fact there will be only 
a very few cases at each extreme. Points toward the center of the 
range show an increasing number of cases. When put into graphic 



go Better Teaching Through Testing 

form it is a symmetrical “bell-shaped” curve. In other words indi- 
viduals tend to be alike in ability, but there are always variations 
toward superior and inferior ability. The measuring device must 
discriminate between ability at all levels. It is in the middle of the 
range, or in the area of similarity, that discrimination is apt to 
be inadequate. The achievement scale is simply an elaboration 
of the distribution of ability shown by a given group on a given 
test. 

This last statement then suggests that distributions vary for 
different groups. Since this is true, one is justified in expecting every 
report of test construction to describe the population on which the 
report is based. That is, it should state the age or grade level; the 
sex; number of cases; and special characteristics, if any, which 
would affect the results. 

The reliability and validity of a test are expressed as correlation 
coefficients. Such coefficients are merely a numerical expression of 
a degree of relationship. The reliability coefficient expresses the 
relationship between two consecutive administrations of the test, 
therefore, indicates its consistency of measurement. It is not always 
practical to give two duplicate administrations when studying 
tests. As a substitute on tests where several trials are given, the 
alternate trials may be correlated (the so-called odd-even method 
of determining reliability). This certainly measures the consist- 
ency within the series of trials. It must be remembered that such 
a coefficient will usually be lower than if two administrations had 
been given, because the correlation is actually made on half of the 
series of trials. Correction is possible for that difference and is 
sometimes made. The method of correction most commonly used 
is by the Spearman-Brown Prophecy formula. The reader should 
note in every case whether such correction has been made in order 
to know how to interpret the coefficient. 

The validity coefficient expresses the degree of relationship be- 
tween a criterion and the test. The higher the relationship, the 
more truly does it measure the ability in question. It is obvious 
since the criterion is the yardstick against which the test is com- 
pared, that the criterion must be a good one. The reader has the 
right, therefore, to expect a statement of the criterion used. If 
ratings are used they must be acceptable (see p. 154). 



81 


Characteristics of a Good Test 

Since reliability and validity are expressed as correlation coeffi- 
cients, it is necessary to interpret these quantitative values. There 
must be a clear distinction between .90 and .09; between -{-.90 
and — .90; also one must understand how much better .90 is than 
an .80, and why coefficients are always decimal values and never 
whole numbers. 

These points can be understood best by a consideration of the 
principle and procedure by which the coefficient is computed. The 
correlation is a mathematical expression of relationship between 
two factors or abilities as measured on a given group of persons. 
If it is a reliability correlation, the ability as measured on the first 
day is compared with a similar performance by the same group 
on a second day. If it is a validity coefficient, one factor is the cri- 
terion by which the measure is being checked, and the other is the 
experimental measure itself. The correlation is essentially a process 
of plotting the two factors under consideration into a single graph. 
This graph is called a scattergram. Let us consider Figure 1 . The 
vertical scale OC represents the distribution of cases on the cri- 
terion. If fifty cases had been placed in rank order on that scale 


C 



Figure z. Diagram of Correlation Graph or Scattergram 

then #1 is at the top or high end of the scale and #50 at the low 
or zero end of the scale. All other cases are in successive order 
between these two. Likewise, each case has been placed in rank 



gg Better Teaching Through Testing 

order in the experimental measure to be plotted according to XO 
from high down to low. 

The effects of similarity and discrepancy in these two rank 
orders is demonstrated in Figure s. In part a it is assumed that the 
two orders are in perfect agreement. The same person would be in 
the top position each time and would be represented by #1 on 
the scattergram. All other cases would be similarly ranked and 
their placement on the graph would be in a straight line from O 
diagonally upward. The computation is such that if this occurred 
the coefficient would be 1.00. Such an instance probably would 
never occur because of chance variations, differences in effort, 
errors in measurement and other similar shortcomings of measur- 
ing techniques. Occasional slight variations in rank may occur 
and the plotting on the scattergram would then appear more 
as in Figure g, part b. Such a scattergram shows the same general 
direction in the design of tallies across the graph but they do not 
hold to a narrow straight line. A scattergram such as b would give 
a coefficient of approximately .90, or something a little less per- 
fect than a. 

On the other hand if there is no similarity between the rank 
orders a scattergram such as c would occur. In that case an indi- 
vidual who is low on one axis may be anywhere from low to high 
on the other, and the individual who is high on the first may be 
of any rank in the second factor. Such a heterogeneity of tallies 
would produce a coefficient at or near zero. 

There might be instances where the rank order of one scale is 
completely reversed in the second. Such a scattergram is shown in 
part e. The tallies all fall in a straight line as they do in a except 
that they run from C to X instead of from O up through the 
graph. The scattergram in e would also give a coefficient of 1.00 
but it is distinguished from the 1.00 in a by a negative sign. Part a 
shows a direct and exact correspondence of the two scales; its 
coefficient is called positive (written -j- 1.00) and it expresses 
relationship which is direct. Part e shows a perfect reversal of 
the two rank orders; its coefficient is called negative (written 
— 1.00) and it expresses an inverse relationship between the two 
measures. 

It should be apparent that perfect inverse coefficients are no 



Figure a. Scattergrams Yielding Different 


a (-1.00 coefficient 

b— + coefficient 


d- 


Coefficients 

zero coefficient 
— .90 coefficient 


e 100 coefficient 


«4 Better Teaching Through Testing 

more likely to occur than perfect positive ones. Therefore, part d 
represents the more usual form of a high, inverse relationship. 

No exact arbitrary points can be set along the numerical range 
of coefficients as a point of significance or as a high or low degree 
of relationship. Generalizations can be made, however. Higher 
values are necessary for reliability coefficients than for validity 
coefficients. This is true because the validity coefficient cannot 
be higher than the reliability of the measure, in fact it is always 
somewhat lower than its reliability. Reliability coefficients of 
highly complex skills used in physical education are usually lower 
than those to be found on tests of mental capacity or achievement. 
Tests which call for extreme all-out effort, such as endurance tests, 
frequently will be less reliable than those which use a submaximal 
effort. Tests on girls are usually less reliable than similar tests on 
boys, apparently since it is more difficult to motivate the girls to 
their best efforts. The performance of inexperienced players is 
usually less reliable than that of highly skilled ones. The reliability 
of a measure increases with the number of trials, though it is far 
short of being in direct proportion to the increase in number. A 
coefficient computed on many cases can be relied upon as being 
more stable than a similar one computed on very few cases. A 
minimum of fifty cases is desirable, preferably a hundred or more. 

With the above points in mind let us then generalize on quali- 
tative interpretations of numerical coefficients. Anything above 
.85 is considered very good, but above .95 almost impossible. 
From .75 to .85 is considered adequate for many purposes. As re- 
liability coefficients drop below the .75 value they indicate an 
inconsistent and poor measuring tool. A validity coefficient below 
.60 to .65 indicates poor predictive value. 

It must not be assumed that low coefficients are meaningless. 
Rather they indicate a lack of relationship and that fact should 
be interpreted accordingly. A low reliability coefficient, indicating 
inconsistency of measurement may suggest additional trials, or 
improvement in details and conditions under which the test is 
conducted. A low validity coefficient would indicate that the test 
is worthless for predicting the ability which it was assumed to 
measure. However, if it seems to have some merit as determined 
subjectively and is shown to be highly reliable it might make a 



Characteristics of a Good Test *5 

very good practice test and serve the purpose of securing interest 
and effort from the pupils. (Free throw shooting in basketball is 
an example.) Achievement records or grades, should not be based 
on such a test. 

In another case several tests might have been studied for their 
relationship to general ability in a sport. Each of them may .show 
fairly high validity coefficients, i.e., high relationship to this gem 
eral ability. Part of the tests may correlate highly with some of the 
others, and some may show very low correlations with others. If 
a combination of tests is to be used, then those which have high in- 
tercorrelations would be discarded because they measure the same 
or similar qualities. Those with low intercorrelations would be 
selected because they measure different aspects of the general 
ability. 

The above principle explains the basis for construction of test 
batteries, or series of tests, to measure or predict a general ability. 
The validity of the total battery is computed by a multiple corre- 
lation instead of by the procedure just outlined. In order to get 
a high validity from such a combination it is essential that each test 
be good, i.e., reliable and reasonably valid. It is also necessary that 
the intercorrelations be low. Let us illustrate these statements with 
a rather obvious example. Suppose you have a box; you know 
nothing about its contents and it cannot be opened. One means 
of describing it to someone else would be by its shape or objec- 
tively by its external measurements. Suppose you use a good cloth 
tape to determine its length, breadth, and thickness. That would 
certainly be descriptive but it would add nothing to your further 
description of the box and its contents to proceed to use a flexible 
steel tape and measure it again. You would simply have two 
measures of the same thing, i.e., two measures with high inter- 
correlations. It would be much more helpful to determine its 
weight as a second measure. While size and weight may tend to 
be related, there doubtless would be a lower intercorrelation 
between weight and size than between size as measured by the 
cloth tape and size as measured by the steel tape. Knowing 
weight and size we now have a better clue to the nature and 
amount of the contents than was known before. And so additional 
qualities may be measured if the means exist for that measure- 



s6 Better Teaching Through Testing 

ment, and additional measurement will be valuable if it does not 
duplicate something already done. 

Multiple correlation coefficients are expressed in the same form 
as those of the simple correlation, but are always positive. Some- 
what higher values of multiple coefficients are expected than in the 
case of a single test since more measurements are being considered. 

The multiple correlation tells which tests to combine in a bat- 
tery to measure general ability but it does not tell how to com- 
bine them. Frequently the units in which one test is scored are 
so different from those of another that one would completely domi- 
nate the final score if they are simply added together. For example, 
you might wish to combine scores from a softball throw for distance 
where the average score was 60 feet and a target throw where the 
average score was 15. The score on the throw is always so large 
that variations in accuracy on the second test would fail to show 
up in the total score. It might also happen that accuracy was more 
important in the estimate of ability than strength for a distance 
throw. The scores are almost always weighted, therefore. This is 
done by the regression equation which specifies the proportion of 
the raw scores to be used. 

The other alternative to the use of the equation when com- 
bining tests is to take the student’s T-score for each test and add 
these together. (See p. 222 for explanation of T-scores.) Both pro- 
cedures take into consideration the range of all scores and the 
ability of the student with respect to the rest of the group. The 
sum of the T-scores is computed more quickly and is very satis- 
factory. 

TESTS SHOULD PROVIDE A MEANS 
OF INTERPRETING PERFORMANCE 

The average and mean are synonymous terms. The 
median score is that point in the scale that is excelled by fifty per 
cent of the cases and is better than fifty per cent. The mean and 
median, as measures of central tendency, are not particularly use- 
ful to the teacher who is concerned with individual performance. 
The measures of distribution (or difference from the mean or 
median) are much more valuable. These measures are the quartile, 



Characteristics of a Good Test 27 

percentile, and the standard deviation. The quartiles locate the 
scores between which twenty-five per cent of the cases fall, in other 
words each half created by establishment of the median is again 
halved. See Figure 3 where each quartile contains twenty-five per 
cent of the cases and point 2 is the median. It will be observed that 
the score range for A-i, 1-2, 2-3, 3-Z may not be the same. A student 


25 25 yi 2 5 * 25 y. 



Figure 5 . Quartiles 

A-Z— range of scores l -a— second quartile 

A- 1— lowest or first quartile a-g— third quartile 

3-Z— upper or fourth quartile 


understands very readily when told in which fourth of the class 
his score falls. 

Percentiles follow the same plan with subdivisions into smaller 
than twenty-five per cent. A student also understands readily what 
is meant if told that his score is on the 65th percentile, i.e., it is 
better than sixty-five per cent of the group but surpassed by thirty- 
five per cent. 

The standard deviation is a yardstick commonly applied to any 
symmetrical bell-shaped distribution, approximating the normal 
curve. The curve is bell-shaped because the frequencies decrease as 
the amount of deviation from the center increases in each direc- 
tion. Since the standard deviation is a “yardstick.” the units on it 
are the same length throughout the length of the scale. Each stand- 
ard deviation is named according to its deviation from the center 
or the mean. Because of that specific location, each has a specified 
number of frequencies in it (Figure 4), and is equaled only by its 
partner on the opposite end of the scale. The length of each stand- 
ard deviation is uniform for each distribution but its length differs 
for each distribution since the total range varies and the distribu- 
tion of cases through the range varies. The size of the standard 
deviation increases when the total range increases, or when the 
scores fall with greater frequency near the extremes. 

The T-scale is an easy and practical interpretation of the 



s8 Better Teaching Through Testing 

standard deviation of a distribution, and is the basis for many 
achievement scales. The mean is always 50. Each standard devia- 
tion equals 10 points on the T-scale. Therefore, the limits of the 
scale are approximately 20 and 80. Likewise, it is apparent that 
about two thirds of the class will have scores between 40 and 60 
and that there will be very few qo’s and 7 o’s. A simple explanation 
to the student suffices. For example, 50 is average on the test; 
above 50 is better than average, the higher the better; below 50 
is less than average, the lower the score the poorer it is. 



Figure 4, Standard Deviations 

M — mean of the distribution 1 — first standard deviation from mean 

-1- — above the mean 2 — second standard deviation from mean 

below the mean 3 — third standard deviation from mean 

The chief advantage of T-scores is that a direct comparison of 
performance on different tests is possible. It is impossible to say 
whether a 50 foot throw is better than 10 points on a target test 
unless each is interpreted as good or poor. The T-score does this 
on the basis of standing with respect to the rest of the group. Such 
a comparison is invaluable in diagnostic testing and in motivating 
student effort. 

The T-scores may also be added together for a composite score 
as mentioned previously. There is adequate evidence that relative 
standings are almost unchanged when regression equations are 
used as compared to the sum of the T-scores. (Correlations by the 
authors on such comparisons range from .94 to .98.) 

Some achievement scales range from o to 100. They are con- 
structed on the same plan except they are based on the probable 
error rather than the standard deviation. The probable error is 



Characteristics of a Good Test 29 

.6745 of the standard deviation, therefore, the distribution has 
a range from — 5 probable error to -j- 5 probable error. With 
one probable error equal to 10 points the range is o to 100. 

Either type of scale may be used according to the wish of the 
teacher and whichever type is available, if using scales from other 
groups. There seems to be one advantage to the T-scale, particu- 
larly when used for stimulating student effort. The poor students 
may have tried hard and actually received some score on the test. 
If the probable error scale is used the reward may be less than 10 
and may be very discouraging. If the T-scale is used the reward 
is in the 20’s. Such a score seems to give encouragement. At the 
other extreme a student who makes in the 90’s is apt to feel satis- 
fied and not make much further effort. On the other hand if the 
T-scale is used, which places him in the 70’s, it seems to point up 
the fact that there is still room for improvement. 


BIBLIOGRAPHY 

1. Cassidy, Rosalind: New Directions in Physical Education for the Adolescent 
Girl in High School and College. A. S. Barnes & Company, New York, 1938, 

p. 124139 

2. Davis, Elwood C„ and Lawther, John D.: Successful Teaching in Physical Edu- 
cation. Prentice-Hall, Inc., New York, 1941, Chapter 22 

3. Glassow, Ruth B., and Broer, Marion R.: Measuring Achievement in Physical 
Education. W. B. Saunders Company, Philadelphia, 1938, Chapters 1 and 2 

4. National Education Association, American Education Research Association: Re- 
view of Educational Research, “Educational Tests and Their Uses,” 8, December 
1938, Chapters 4 and 6 

5. Williams, Jesse Feiring, and Brownell, Clifford Lee: The Administration of 
Health and Physical Education. W. B. Saunders Company, Philadelphia, 1940, 
Chapter 19 



Administration of Tests 


There are numerous conditions which affect testing, 
and since many of these vary for the different tests and teaching 
situations, only those principles fundamental to all testing will be 
discussed in this chapter. Suggestions for specific tests and for 
adaptations to local situations will be found in Chapters 4, 5, 6, 
and 8. A few conditions which vary from one situation to an- 
other are the number of persons to be tested, the time allotment, 
the amount of floor, field, and wall space, the amount of equip- 
ment, and the use to be made of test scores. 

CLASS ORGANIZATION 

The amount of time that it takes to administer a test 
can be reduced considerably by careful organization. First, we will 
consider the organization and administration of motor tests, with 
knowledge tests coming later in the chapter. Some tests can be 
administered along the edges of the floor or field, without inter- 
fering with the progress of the class activity, while for others it 
may be necessary to use the entire space for testing during a portion 
of the class period. In the former situation, student leaders and 
the squad type of class organization are most helpful. The squads 
or teams used for class instruction can be the unit for testing. The 
instructor should prepare instructions and plan the demonstration 
in advance and then present them to the entire class at one time. 
For some tests, practice trials are necessary. Care should be taken 
to see that all receive an approximately uniform amount of prac- 
tice. All this can be done before dividing into groups. The testing 
may take a portion of several class periods or one may prefer to 
devote a few consecutive periods to it. No one plan is best for all 
situations and for all tests. 



Administration of Tests 


3 1 


USE OF SPACE 

A careful study should be made of all available space 
and plans made for its efficient use. If several persons can be tested 
at once, the amount of time necessary for conducting the tests is 
reduced. For example, if a test for accuracy in softball throwing is 
being conducted, instead of having just one target, arrange for as 
many as space and balls will permit, being careful to allow enough 
distance between targets for the scorers to stand without danger of 
being hit. 

Another way to speed up the testing is to arrange areas for prac- 
tice trials at one side if limited equipment is available for the 
actual administration of the test. An example of this is the badmin- 
ton serve test described in Chapter 4. Strings can be tied to a post 
(standard) placed off the courts, one at net height and another 
twenty inches above. Marks indicating the floor plan can be 
chalked or painted on the floor. Oilcloth targets, to be placed on 
the floor, adapt nicely to this procedure as they can be moved at 
will. This arrangement permits practice space before going on the 
court for the actual trials. A similar arrangement can be con- 
structed for practicing the badminton clear test. 

TRAINING THE ASSISTANTS 

In large scale testing, or where student assistants 
are to be used in conducting and recording, at least one organiza- 
tional meeting should be held. The purpose of this meeting is to 
give everyone a complete understanding of the entire process. The 
instructions should be explicit and all should be impressed with 
the necessity of having uniformity of procedure. Definite typed or 
mimeographed instructions should be prepared and given to each 
assistant, to be reviewed just prior to the actual testing. 

All dimensions given by the test author should be carefully fol- 
lowed, if you expect to secure comparable results. If you are only 
interested in making comparisons within your own group, then 
alterations in the tests can be made. Time intervals, amount of 
practice, and the number of trials should be held constant. Some- 
times the wall space is interrupted so much by windows, stall bars, 



32 Better Teaching Through Testing 

overhanging balconies, doors, and other fixtures that you can only 
approximate the dimensions. When this is the case, you cannot 
be sure that the test results will be either as reliable or as valid 
as found when administered under more ideal conditions. 

Scorers often need special training. This is particularly true 
when they have to make a split-second decision as to where the 
object landed. Various devices have been tried in an effort to 
secure greater accuracy in scoring. Brophy 1 made an accuracy 
target with four concentric circles, alternating tin with wall space 
to secure sound effects. (See p. 16 for description of a portable 
target.) Schmithals 2 placed a wooden board across the goal posts 
to assist in determining the exact second that the ball reached the 
goal line. A large target has been used successfully for a volley ball 
serve test. Some of these special devices are cumbersome to move 
and require extra time and materials for construction. The scorers 
should experiment during the training period until they find the 
spot where visibility is best. For tests involving wall targets, stand- 
ing to one side and about ten feet from the target works well. The 
object should be seen in flight and the scorer should shift his gaze 
directly to the target before the contact. Sometimes it is wise to 
have both a spotter and a recorder, with the recorder repeating 
the score aloud before recording, as a further check on accuracy. 

The training of assistants is particularly important in events 
where form is to be judged or fouls called. It is wise to have them 
go through the tests themselves, emphasizing the correct and the 
wrong points, and then practice judging each other until the per- 
son in charge of testing is certain that they are judging with a high 
degree of uniformity. 

If the assistants are to help in motivating the students to their 
best efforts, then they should be given hints in doing this. The 
manner in which the tests are presented is extremely important, 
and will be discussed later. Giving words of encouragement, hav- 
ing a sincere interest in the individual’s score, and showing pleas- 
ure at extreme effort are but a few of the ways in which assistants 
can motivate students. Care must be taken not to embarrass the 
poorly skilled persons or to make the actual score seem the all 

*» 3 See reference in bibliography at end of this chapter. 



Administration of Tests 33 

important thing. On the other hand, unless the students are 
motivated to their very best efforts, the scores are meaningless, and 
the time spent in testing might better have been spent otherwise. 

If the test directions call for stop watches, assistants should be 
given instructions concerning their use and how to read them. 
All stop watches should be checked by a jeweler immediately 
before the tests are conducted, if there is any discrepancy between 
them or doubt as to their accuracy. 

DEVICES FOR FACILITATING MEASUREMENT 

Targets are used in many tests. A satisfactory method 
to use in placing them on walls or floors is to paint them. A quick 
drying, washable paint should be used. Show card or poster paint, 
which can be purchased in a wide variety of colors, works very well. 
It can be removed with a damp cloth and lasts indefinitely on wall 
surfaces or on floors until they are scrubbed. Some colors may leave 
stains if applied to a porous surface and should be pre-tested. In 
making targets of concentric circles, a string tied to a piece of chalk 
can be used to outline the pattern. Knots can be tied in the string, 
corresponding to the various radii. Care should be taken to select 
string that does not stretch easily. The directions for tests usually 
indicate the width of lines; a paint brush of the same width is 
convenient. 

The use of different colors aids in scoring. Another aid is the 
painting on the target of the score value for each space. 

Adhesive tape, torn in proper widths, can be used in constructing 
wall targets involving straight lines, such as the repeated volleys 
test (see p. 101) or in a tennis serve test. This is a quick but rather 
expensive procedure. 

If you wish to be able to move a target from place to place, the 
target can be made of some lightweight material, such as oilcloth. 
Crayola or colored chalk can be used on the rough side of oilcloth 
or the smooth side can be painted, or outlined with India ink. 
This type of target can be used on dirt or grass surfaces, or placed 
on flooring. 

Ropes, strings, or poles are used in some tests to insure certain 
heights or distances. Two bamboo poles, placed across jumping 



34 Better Teaching Through Testing 

standards can be used to outline a target. A clothesline rope is used 
in the badminton clear test, placed across the floor fourteen feet 
from the net and parallel to it, at a height of eight feet from the 
floor. The rope can be attached to high standards, such as those 
used in the pole vault or for tetherball, provided that their bases 
are heavy enough so that the weight of the rope will not cause them 
to tip slightly, thus allowing the rope to sag. Screw rings can some- 
times be placed in the walls; in the case of brick walls, the screw 
can be inserted in the mortar between bricks. The strings used in 
the badminton serve test can be tied to the standards holding the 
net. If these standards are not high enough, they can easily be 
extended by taping lightweight sticks, such as yardsticks, to the 
standards. Sandbags can be placed on the bases of standards if 
necessary to hold them erect. 

In tests involving throws, kicks, or hits for distance, time will be 
saved if the field is laid out so that the contestants can start from 
either end. Lines parallel to the starting line can be marked, thus 
permitting quite a few to be tested at once, providing the field is 
wide enough and the supply of balls is ample. Markers can be 
placed along both sidelines, indicating the value of the zones, pro- 
gressing from the zero or starting line to the highest score. These 
markers should be placed along the right hand sideline. (Figure 
5) The lines across the field can be made with lime, using regular 
field liners; or with string or linen tape, tied to sharp objects and 
pulled taut to the ground. If the number of subjects or trials is 
small, these lines are not essential. Small objects, such as sticks or 
stones, can be used to mark the landing spot. If each contestant is 
to have three trials, with only the best effort being measured, a 
pointed stick (tongue depressors are convenient), bearing the con- 
testant’s number or initials, can be inserted at the spot. Measure- 
ments can then be made when all trials have been completed. 

In some track and field events, such as the discus throw for boys, 
and the shot put for boys, the regulations call for measuring 
directly from the starting point to the spot where the object first 
broke the turf. Here there is no desire to penalize for lack of accu- 
racy in direction. It is recognized that when zones are laid off by 
marking lines parallel to the starting line that there is a penalty 
for gross inaccuracies. To avoid this, for the events mentioned 



£ 

OT 

95 

on 


yu 

ra» 

n? 

03 

on 

Wb 

■ oU 


f\j 

• 7 n 

Uz. 

ro 

/u 

ax 

en 

Ov 

<?■*» 

ou 

«*/? 

jr 

AO 



t/U 

j4 1 ? 


nr 3 


c tu 

TEST 

A*? 

ny 


i/Ai 

0 / 

*y «■ 

m4 1 

no 

on 

Oo 

An 

6V 

t K 

So ' 

13 

in 

OD 

SG 

Iv 

5 


Figure 5. Parallel Field Markings for Distance Events 



38 Better Teaching Through Testing 

of time. Thus, one stop watch can be used for the entire group 
being tested. A spotter is assigned to each runner, and is expected 
to be opposite the contestant at the end of the time interval. If 
properly trained in observing running skill, the spotter will be 
able to make adjustments in position for judging the various 
speeds. Markers which can be easily read should be placed along 
both sidelines, so the spotter can score the finish by looking across 
the lanes. (Figure 7) 

Another method for timing several individuals with only one 
stop watch available, provided the time is to be taken to the near- 
est full second, is to have the timer, stationed near the finish line, 
count the number of seconds aloud. The spotters watch the indi- 
vidual runners or swimmers and record the time called imme- 
diately before the individual reached the finish line. 

When stop watches are not available, a metronome can be used. 
The sound can be supplemented by drum or by voice. The met- 
ronome can also be used in standardizing rate in such tests as 
endurance tests where a certain exercise is continued for a con- 
siderable period of time. 

Lines on the mat covers can be used for convenience in scoring 
such events as the standing board jump. If only a few lines are 
drawn and measurement is desired to the nearest inch, a ruler may 
be used to measure from the nearest line to the imprint of the foot. 

In conducting an event such as the high jump, where the total 
number of trials is not predetermined, it is suggested that the 
group be divided into squads according to ability. If this is un- 
known, then divide them according to height, sending the taller 
jumpers to one set of standards and the shorter to another. If 
only one set of standards or one jumping pit is available, small 
squads should be used and they should be rotated to various events, 
in order to eliminate the time wasted while standing to wait for 
turns. 

SCORE CARDS 

Score cards should be prepared in advance, with the 
names of the persons to he tested and the number of trials indi- 
cated. Where scores are to be converted into some other form, a 



Administration of Tests gg 

space should be provided for the converted score, adjacent to the 
total raw score. A box of pencils should be placed in a convenient 
spot. 

Various types of score cards have been used, and each has its 
merits. If scores are to be used from one quarter or semester to 
another, then the individual score card, which can be turned over 
to the new instructor or re-alphabetized in the new grouping, is 
preferable. A sample of such a card is shown below: 


NAME 

Last First Middle Classification 

Fitness Test Scores 


Date 

Physical Education 
Class Activity 

Sit-up 

Bouncing 

Chinning 

Run 
































Ht. Wt. Age Health Rating Posture 


Figure 8. Sample of Individual Score Card 

Squad cards are convenient and economical, when the testing 
is being done with the squad as a unit. Sometimes the card used for 
taking roll will have sufficient spaces for recording test scores. 
When the number of trials is five or more, the addition of scores 
will be facilitated if the card is arranged so that the scores can be 
recorded in a vertical column rather than a horizontal row. 

If performance levels are available for the test, compiled from 
scores made by groups similar to the group being tested, it is help- 
ful to have them placed on the individual score card. They make 









40 


Better Teaching Through Testing 
TABLE I 

Achievement Scales for College Women <»> 


Arranged for Construction of Student Profile 


T-Score 

Obstacle 
race (W 

Pull <«> 

Push- 
ups n 

Sit- 
ups <*) 

Bouncing <*) 

(*> T-Score 

81 



41-up 

io8-up 

2 10-up 

81 

8o 



40 



80 

79 

17.8-17.9 

86 

39 

102-107 


79 

78 



38 

96-101 

200-209 

78 

77 


85 

37 

84-95 

190-199 

77 

76 

18.0-18.1 

84 

36 

72-83 

170-189 

76 

75 

18. *-18.3 

83 

34 

70-71 

160-169 

75 

74 

18.4-18.5 

81 

3 * 

68-69 

157-159 

74 

75 

18.8-18.9 

80 


64-67 

154156 

73 

7 * 


79 

3 > 

62-63 

151-153 

7 * 

7 * 

19.0-19.3 

77 

30 

58-61 

142-150 

7 » 

70 

‘ 9 - 4 - 19-5 

76 

29 

56-57 

139-141- 

70 

69 

19.6-19.7 

75 

28 

54-55 

133138 

69 

68 

>9-8-19.9 

74 

*7 

5*53 

127-132 

68 

67 

20.0-20. 1 

73 

26 


121-126 

67 

66 

20.2-20.3 

72 



115-120 

66 

65 

20.4-20.5 

7 > 

*5 

5 °- 5 » 

112-114 

65 

64 


70 

*4 

46-49 

109-til 

64 

63 

20.6-20-7 

69 

*3 

44-45 

106-108 

63 

6s 

20.8-20.9 

68 

22 

4*'43 

103-105 

62 

6l 



21 



6l 

60 

21 . 0 -^ 0 ) 

67 


40-41 

100-102 

60 

59 


66 

20 


97-99 

59 

58 

2 1.2-21. A 

65 

>9 

38-39 

( 9 & 9 6 

58 

57 

21.4-21.5 \ 


18 

36-37 


57 

56 

21.6-21.7 \ 

64 

17 


/88-go 

X) 5® 

55 

21.8-21.9 

\ 65 

16 

34-35 

/ 85-87 

55 

54 

22.0-22.1 

\ 6 « 



/ 82-84 

54 

53 


V* ^ 


32-33 

/ 78-81 

53 

5 * 

22.2-22.3 


'A 


/ 76-77 

5 * 

5 » 

224-22.5 

< 8 > 

\ 


/ 

5 » 

50 


59 

>3 \ 

30-31 J 

' 73*75 

50 

49 

22.6-22.7 


\ 

V / 

70-72 

49 

48 

22.8-22.9 

58 

12 

\ 28-29 / 


48 

47 

23.0-23.1 


n 

>26-27/ 

67-69 

47 

46 

23.2-23.3 

57 



64-66 

46 

45 

234-23.5 

56 

10 

* 4 pa 

61-63 

45 





Administration of Tests 


4 * 


TABLE I (Continued) 


TScore 

Obstacle 
race < b ) 

Pull (<=> 

Push- 
ups ( 4 > 

Sit- 
ups <•> 

Bouncing (*) 

<«> TScore 

44 

23.6-23.7 

55 




44 

43 

23.8-23.9 




58-60 

43 

4 * 

g 4 . 0 -g 4.1 

54 

9 

22-23 


4* 

4 » 

24.2-24.3 

53 



55-57 

41 

40 

24.4-24.5 


8 

20-21 

5*54 

40 

39 

24.6-24.7 

5 * 

7 



39 

38 

24.8-24.9 

5 » 



49-51 

38 

37 

25.0-25.1 

50 

6 

18-19 


37 

36 

25.2-25.3 




46-48 

36 

35 

* 5 - 4 * 5-5 

49 

5 

16-17 

43-45 

35 

34 

25.6-25.7 

48 



40-42 

34 

33 

25.8-25-9 

47 

4 

14-15 


S3 

3 * 

26.0-26.1 




37-39 

32 

3 > 

26.2-26.3 

46 

3 

12-13 

34-36 

S> 

30 

26.4-26.7 

45 



* 8-33 

30 

*9 

26.8-86.9 

44 


10-11 

25-27 

29 

28 

27.0-27.5 


2 



28 

27 

27.6-28.3 

43 


8-9 

22-24 

*7 

*6 

28.4-28.7 



6-7 

19-21 

20 

*5 

28.8-29.1 

4 * 

1 



*5 

*4 

29.2-29-5 

40 


4-5 

16-18 

*4 

*3 

29.6-29.9 





*3 

22 

30.0-30.9 

39 



10-15 

22 

21 






21 

20 






20 

*9 

31.0-32.0 

38 


2-3 

4-9 

*9 


■ Scales constructed on data obtained from $87 freshmen and sophomore women at University of Iowa. 

■ See page 136 for description. 

c See page 221 for description. 

* See page 1 19 for description. Subject continues as long as possible. 

* Sit-up was taken from a lying position, legs straight and ankles held down by partner, hands were 

placed on shoulders with elbows close to sides. One score for each time .the subject came to an erect sitting 

position, continuing as long as possible. 

1 See page 123 for description. Subject continues as long as possible. 

■ Construct your own score card by substituting for any of the preceding tests and Inserting your own 

scale, and add columns for any item or items you wish to include in the battery. 


it possible for a student to convert his raw score on an event into 
a point score. This enables him to evaluate his own performances 
and also provides him with a measuring stick against which to 
evaluate his own improvement. If it is not possible to have printed 
score cards, a conversion table can be posted for students’ reference. 





42 Better Teaching Through Testing 

Many branches of the armed services have their achievement 
scales constructed for their own group since there is considerable 
variation in the amount of conditioning activity given different 
service units. Such an individual score card lends itself to the 
making of a profile for the student. The profile is diagnostic in that 
it compares abilities as measured by the tests. They consider these 
profiles as good motivation devices. 

Table I is presented as a suggestion for an individual score card 
and has a profile drawn in for a student with the following record: 
obstacle race 21.1 seconds; pull 60 pounds; push-ups 15; sit-ups 
25; bouncing 94; endurance, a T-score of 56. Observation of this 
profile reveals the fact that she is above average in every item 
except sit-ups, and that her best performance is in the agility test 
or obstacle race. A composite of all tests in the battery may also be 
included in the scale with a similar ranking on the total. 

The reverse side of the card may be used for a cumulative 
record, other test scores or important data. If it is small and com- 
pact it is convenient for the student. 

PRESENTATION OF TESTS 

Before any testing is done, it is important to inform 
the subjects of the purpose of the test and how the results will be 
used. It is essential that they be interested and desirous of putting 
forth their best efforts. The attitude toward tests is usually a reflec- 
tion on the selection of tests, the administration of them, the use of 
test results, or the conditioning of the students. An unfavorable atti- 
tude will exist if the tests selected have been uninteresting or mean- 
ingless, or if an undue amount of time has been taken away from the 
activity for testing. This will be true also if the students think that 
too little use has been made of the results, or if the tests have been 
given when they were not in physical condition for them, resulting 
in stiffness and soreness. Some groups will be motivated by an- 
nouncing in advance that the three best scores on each team or 
squad will be posted. If the tests are to be used as a partial basis for 
marking, this should be announced, also. 

The same instructions should be given to all groups. Some tests 
include written directions, to be read to the groups. When such direc- 



Administration of Tests 43 

tions are not available, the following principles * should be fol- 
lowed in preparing a set of directions. 

1. The instructions should be as brief as possible, yet give an 
adequate understanding of what is to be done. 

st. The instructions should make use of a demonstration. 

3. The instructions should be adapted to the understanding 
of all being tested. 

4. The order of instructions should be broken into units, and 
should be in the order of doing. 

5. The instructions should equalize interest and secure maxi- 
mum effort of all. 

The instructions to pupils should be accompanied by instruc- 
tions to examiners, mentioned earlier in the chapter under the 
heading “Training the Assistants.” They should specify whether 
practice is permitted; if so, the number of trials. They should also 
provide for adequate and a uniform amount of rest between trials, 
so that fatigue will not affect the scores. 

A demonstration of the test is usually advisable. The points to 
be emphasized should be thought out in advance and their order 
planned. An example of this is the Dyer wallboard tennis test, 
referred to further in Chapter 4. 

“Demonstrate the following points: f 

1. Two balls in hand. 

2. Start test by dropping ball, letting it hit floor at least once, 
then play it. 

3. Rally a few times, showing volley. 

4. Cross restraining line to retrieve a ball, use a low hit to keep 
it in play, and retreat for the next shot. 

5. Make a wild shot to show how taking another ball saves 
time. 

6. Put this new ball in play as at the start.” 

All the points that apply in presenting a good demonstration for 
instructional purposes apply here, such as having the group placed 

• Adapted from McCall, William A.: How to Measure in Education, Macmillan 
Company, New York, igs2, pp. 235-248. 

t Quoted by permission of the Research Quarterly. 



44 Better Teaching Through Testing 

where they can see and hear, etc. An opportunity for questions 
should be given after the demonstration. 

KNOWLEDGE TESTS 

Care needs to be taken in administering knowledge 
tests as well as motor tests although the procedure is relatively 
simple. All too frequently teachers, through thoughtlessness or 
lack of information, fail to administer them in such a manner that 
all have a fair chance. 

The room should be quiet, well ventilated, and adequately 
lighted. (Usually, if arrangements are made far enough in advance, 
physical education teachers can secure the use of a class room.) The 
seats should be well-spaced or students seated in alternate seats. 
This is of less importance when answer sheets are used, particularly 
those of the type shown on page 46, where copying is reduced to 
a minimum. 

Books and wraps should not be brought into the room. A check 
should be made to see that all are supplied with pencils. 

Answer sheets and mimeographed directions for their use can 
be handed the students as they enter the room. Giving test direc- 
tions orally should be avoided. The directions, if general and to be 
used for all tests regardless of the activity, can be placed on a sepa- 
rate sheet from the test form. When the directions are specific to a 
particular test, they can be placed on the test form itself. An ex- 
ample of general directions to be placed on a separate sheet is 
included in this chapter. 

Be sure that each student receives but one copy of the test ques- 
tions and that all test forms are collected at the end of the examina- 
tion period. If all test forms are numbered consecutively, a check 
can quickly be made to see that all have been returned. Students 
should be asked to record the number of the test form on the an- 
swer sheet. Since this enables the one in charge to locate the 
student who has failed to hand in the test form, it discourages the 
practice of carrying them away. As further insurance against allow- 
ing questions to get into circulation, require each student to per- 
sonally hand in the three forms: answer sheet, directions, and test 



Administration of Tests 45 

form. Have students hand in papers when finished; this tends 
to avoid copying of questions. 

No help should be given in the interpretation of questions, as 
this not only tends to give an unfair advantage to the one asking 
the question but also causes interruptions that are distracting. 

If all are to have the same length of time, test forms can be dis- 
tributed, face down, and all told to wait for a signal before starting. 

The following sample of Directions should serve as a guide. 

DIRECTIONS 

You are to use an Answer Sheet to mark your answers to all 
of the questions in this test. Take the answer sheet now and print your 
name, classification, name of activity in which you are being tested, etc. 
Write the number of the test form in the blank space in the upper left 
hand corner. Then, finish reading these directions. 

Exercises 1-50 in this test are of the multiple choice type, consisting of a 
question followed by several possible answers, only one of which is correct, 
or one of which is definitely better than any of the others. On the Answer 
Sheet you will find as many sets or rows of brackets ( ) as there are ques- 
tions in the test. The number to the left of each row of brackets corresponds 
to the number of the question. 

To answer a question, first decide which is the best answer, then find the 
row of brackets on the Answer Sheet numbered the same as the question. 
Then mark a cross (X) in the brackets corresponding to the correct or best 
response, counting from the left. If the first response is correct or best, 
place a cross in the first brackets in the set; if the second response is correct 
or best, place a cross in the second brackets, etc. 

Exercises 51-75 are of the true-false type. If the statement is entirely true, 
place an (X) in the first brackets; if partially or entirely false, place the 
cross in the second set of brackets. All omissions will be counted as errors. 
The sample questions have been marked correctly on the Answer Sheet. 

Answer the questions in the order in which they are given, but do not 
linger too long over difficult items. Skip those and return later if time per- 
mits. If you do skip an exercise, be sure to skip the corresponding row 
of brackets on the Answer Sheet. Any mark which you unintentionally place 
in the wrong place will count against you. If you change your mind, thor- 
oughly erase your first mark, never place more than one cross in any row. 

Do not begin work until you are told to do so. If you have any ques- 
tions, ask them now. 

Answer sheets, commonly used in wide scale testing in academic 
subject matter, have so many advantages that they should be in 
more common usage in physical education. A few advantages are 
listed below; 



46 Better Teaching Through Testing 

1. Save paper, stencils, and secretarial time as the test forms 
can be used repeatedly. 

2. Can be more accurately and conveniently scored than thumb- 
ing through pages of questions for each student. 

3. Save teacher time. 

A master answer sheet, that can be used for either multiple 
choice of true-false questions in any test where the total number 
of questions does not exceed seventy-five, is shown in the illustra- 
tion. This saves preparing answer sheets for each examination. The 
sheet can be made to accommodate more questions by reducing the 
size of the type. 

Number of Test Form 

Please do not 

ANSWER SHEET write in space 

below. 

Name 

(fast name) ( first name ) 

Classification Date , 19 

Activity Instructors Name 



Sample for multiple choice: Sample for true-false: 

<>•()() (X) ()( ) o. (X) ()()()( ) 

**345 i*345 i*345 

1 . ()()()()( ) *6- ()()()()( ) 5i- ()()()()( ) 

*•()()()()() *7- ()()()()( ) 5*. ()()()()( ) 

3- ()()()()( ) *6. ()()()()( ) 53- ()()()()( ) 


*5- ()()()()( ) 50. ()()()()( ) 75- ()()()()( ) 

i*345 i*345 i*345 

Figure 9. Sample of Answer Sheet for Knowledge Tests 

Punched keys for use in scoring answer sheets can be quickly 
and easily made by following the directions listed below: 

1. Attach a blank answer sheet to a piece of lightweight card- 
board with paper clips, inserting a piece of carbon paper 
between the two. (Manila filing folders work very well.) 



47 


Administration of Tests 

g. Label the key with name of the examination. 

3. Trace a few of the question numbers, to be used later as a 
guide when superimposing the key on the answer sheets. 

4. Place crosses in all of the appropriate brackets, indicating 
correct answers. 

5. Remove the dummy answer sheet and carbon paper and pro- 
ceed to use a paper punch on all of the crosses. Paper punches 
with wide jaws are best for this purpose, since they permit 
reaching across to the center of the cardboard without rolling 
or folding. An ordinary “dime store” paper punch can be 
used if you cut the cardboard lengthwise into thirds. After 
completing the punching process, the cardboard can be 
taped together again. 

6. Check to see that the holes have been properly placed by 
placing the key on the dummy of correct answers. 

In scoring, superimpose the cardboard key on the answer sheet, 
being careful to place it in proper position. If the margins of the 
key are trimmed so that the question numbers appear, placement 
is facilitated. Place a red dot in any hole where a cross does not 
appear, counting the number of errors and record the number on 
the answer sheet. This is of convenience in going over examina- 
tions later with students as it locates their errors and also indicates 
the right answer for each question they missed. 


BIBLIOGRAPHY 

1. Brophy, Kathleen: "Target for Testing Accuracy in Softball Throwing,” Softball- 
Volleyball Guide, A. S. Barnes & Company. New York, 1939, p. 29 

2. Schmithals, Margaret, and French, Esther: "Achievement Tests in Field Hockey 
for College Women,” Research Quarterly, 11, October 1940, p. 840 

3. Scott, M. Gladys: “Achievement Examinations in Badminton,” Research Quar- 
terly, 12, May 1941, p. 242 

4. Wetzel, Norman C.: “The Simultaneous Screening and Assessment of School 
Children," Journal of Health and Physical Education, 13, December, 1942, 
p. 576. 

5. Women’s Advisory Committee on Physical Education, California State Superin- 
tendent of Schools: "Physical Education— A Wartime and Peacetime Program 
for Girls,” Stanford University Press, 1942, pp. 225-228. 



4 


Measurement of Skill in Sports 


The development of skill is widely recognized as 
one of the major objectives in physical education. Measuring sports 
skills is an important aspect of the teaching procedure. Skill tests 
have been used for a considerable length of time, but many of the 
earlier tests had no statistical work done on them. Therefore, their 
reliability and validity were questionable. Also, it was not known 
how much they duplicated one another, or the extent to which they 
measured general motor ability rather than ability in the activity 
per se. Considerable experimentation has been done in the last 
decade, and some of the results have been published. The reports 
of these studies are not readily available to teachers, since they are 
scattered throughout the physical education literature, and some 
have not been published, heretofore. We will attempt in this chap- 
ter to present the ones which are of proven value and to include 
enough information on each that they can be used. Tentative 
standards, when available, are suggested for many of the tests. The 
source of the data for each scale is given so that the reader will have 
some basis for deciding whether the scale is applicable to the class 
for which the test is being used. 

It is not our purpose to duplicate the work of previous writers 
in presenting a critical review of all published tests. 

BASIS FOR SELECTION 

Tests have been selected which come closest to 
meeting the criteria set forth in Chapter 2, Characteristics of A 
Good Test. The bibliography presented at the end of this chapter 
is provided for those persons wishing to study more of the avail- 
able published tests, or those persons wishing to give a more exten- 
sive battery of tests. 


48 



Measurement of Skill in Sports 
STANDARDS OR ACHIEVEMENT SCALES 


49 


The standards or achievement scales presented here 
are based on scores made by fairly limited and sometimes rather 
select groups. The group is described in each case. The scales are 
not to be interpreted in the same manner as nation-wide norms. 
Unfortunately, data have not been collected for boys and men, 
though the majority of the tests should be useful for that group. 
This is not too great a handicap as usable scales can be constructed 
on scores made by classes in any school. Frequently this is necessary 
even though other scales are available. 

BADMINTON 
1 . Serve 

Equipment: A clothesline rope stretched 20 inches di- 

rectly above the net and parallel to it, at- 
tached to the same standards as the net. New 
birds and tightly strung racquets. 

2. Floor markings 

Using the intersection of the short service 
line and the center line as a midpoint, de- 
scribe a series of arcs in the right service 
court at distances of 22 inches, 30 inches, 
38 inches, and 46 inches from the midpoint, 
measurement including the width of the 2 
inch line. Extend these arcs from the short 
service line to the center line, as indicated in 
the diagram. (Figure 10) The lines should be 
painted in different colors to increase accu- 
racy in scoring. Show card paint, which can 
be washed from the floor, is suggested. 

Test: The player being tested shall stand any place in the 
service area diagonally opposite the target, 
and shall serve twenty birds, attempting to 
send them through the space between the 
rope and the net in such a manner that they 
land in the right service court for the doubles 



50 Better Teaching Through Testing 

game. The scorer shall stand near the center 
of the left service court on the same side of 
the net with the target and facing the target. 
The corner of the target nearest the intersec- 
tion of the service line and center line counts 
five points, the next space four points, the 
next three, then two, and any bird off the 
target but in the service area for the doubles 
game counts one point. 



Figure 10. Floor Markings for Badminton Serve Test 
1-5 score for respective areas 

Scoring: No score for any trial which fails to go between 
the rope and the net or which fails to land in 
the service court for the doubles game. Any 
bird landing within an area or on the line 
surrounding an area is scored as shown in the 
diagram. Any bird landing on a line divid- 
ing two scoring areas shall receive the score 
of the higher area. The score for the entire 
test is the total of twenty trials. It is con- 
sidered a foul and the trial is repeated if 
the serve is illegal.* 

* For definition of legal serve, see American Badminton Association rules. 



Measurement of Skill in Sports 51 

Reliability: Odd-even method, stepped up by the Spear- 
man-Brown Prophecy Formula, .88, com- 
puted on 29 major students, State University 
of Iowa.* 

Validity: The validity of the test was found to be .66, when 
correlated with a criterion of tournament 
standings (ladder tournament carried on 
throughout twenty class periods). 

T-scores: Computed on over two hundred University of 
Minnesota students (not physical education 
majors) at the end of about thirty class 
periods of instruction. See page 55. 

Reference: Scott, M. Gladys: Achievement Examinations 
in Badminton, Research Quarterly , 12, 
May 1941, p. 242. (Quoted by permission 
of the Research Quarterly.) 

Comments: This test measures accuracy of placement and 
also the ability to serve the bird in a low 
flight. It is easy to administer, and can be 
given off the courts, so that it does not inter- 
fere with play. See Chapter 3 for sugges- 
tions. The amount of practice should be held 
constant for all players and the test should 
not be administered until the majority have 
acquired the ability to make short, low 
serves. The condition of equipment affects 
the scores decidedly. 

Clear 

Equipment: 1. A clothesline rope stretched across the court 
14 feet from the net and parallel to it, at a 
height of 8 feet from the floor. 

2. Floor markings 

a. Construct a line 2 feet nearer the net 
than the rear service line in the doubles 
game and parallel to it. Measure from the 
exact center of the line. Extend this line 


• See p. *41 for a discussion of the statistical procedures. 



52 Better Teaching Through Testing 

from one outer alley line to the other 
outer alley line. 

b. On the same side of the net, construct a 
line 2 feet farther from the net than the 
rear service line in the single game and 
parallel to it. Measure from the exact 
center of the line. Extend this line from 
one outer alley line to the other outer 
line. The lines should be painted differ- 
ent colors to increase accuracy in scoring. 

c. On the opposite side of the net, draw 
marks s inches square at spots indicated 
on the diagram as X and Y. The center of 
X should be 1 1 feet from the net and 3 
feet from the center line toward the left 
sideline. The center of Y should be 11 
feet from the net and 3 feet from the 
center line toward the right sideline. In 
measuring from the center line, use the 
exact center of the line. 



8-5 score tor respective areas X-Y limits of set up tor clear stroke 

Test: The player being tested shall stand between the two 
square marks, on the court opposite the tar- 
get. The person giving the tests (player with 





Measurement of Skill in Sports 55 

considerable experience) shall stand on the 
intersection of the short service line and the 
center line on the same side of the net as 
the target and shall serve the bird to the 
player being tested. The bird must cross the 
net with enough force to carry it to the line 
between the two squares before it touches 
the floor. (This is an imaginary line.) If it 
does not go that far or is outside the space 
between the two squares, the player being 
tested should not play it. The player being 
tested may move any place he wishes as soon 
as the bird has been hit to him. Only birds 
played by the player being tested shall count 
as trials. He shall attempt to send the bird 
by means of a clear stroke above the rope 
so that the bird lands on the target. Twenty 
trials are allowed. The person giving the 
test should call out the score of each trial, to 
be recorded by an assistant. The area be- 
tween the two rear lines of the regulation 
court counts five points, the space just be- 
hind it counts three points, and the space just 
in front of the two rear lines of the regula- 
tion court counts four points. Any bird go- 
ing over the rope but failing to reach the 
target counts two points. 

This test can be given to two players at 
once on the same court, extending the imagi- 
nary line farther. 

Scoring: No score for any trial failing to go over the rope or 
failing to land in the court in the space be- 
hind the rope and on the target, as indicated 
on the diagram. Any bird landing within an 
area or on the line surrounding the area is 
scored as shown in the diagram. Any bird 
landing on a line dividing two scoring areas 
shall receive the score of the higher area. 



54 Better Teaching Through Testing 

The score for the entire test is the total of 
twenty trials. It is considered a foul and the 
trial is repeated if the stroke is “carried” or 
“slung.” * 

Reliability: Correlation by the odd-even method, .91; 

stepped up with the Spearman-Brown 
formula, .96. The subjects were the same as 
for Test 1. 

Validity: .60 with a criterion of tournament rankings. 

T -scores: See page 55. 

Reference: Same as for Test 1. 

Comments: This test, even with the disadvantage of the bird 
being put in play by another player, is so 
highly reliable that it would appear that 
experimentation should be done to deter- 
mine the effect of fewer trials. The compara- 
tively low validity also indicates the need 
for further study. It is an excellent practice 
device. It measures power and to some ex- 
tent accuracy in the strokes. It would also 
seem logical that it is a measure of the play- 
er’s judgment. 


BATTERY OF BADMINTON TESTS 

The one best test appears to be the serve test, as de- 
scribed here. The experimental battery included six tests, two 
each on serve, clear, and drop. The validities on the drop tests 
were low. The clear and serve tests described here have a low in- 
tercorrelation, .11, and therefore, appear to be measuring quite 
different abilities. When the scores on the two tests are combined, 
by means of multiple correlation, the validity coefficient was .85, 
which indicates that they make a satisfactory battery. The formula 
for insuring proper weighting to the two tests, when combined, is 
as follows: 

1. serve -f- 1.2 clear 


* See official A.B.A. rules for interpretation of terms. 



55 


Measurement of Skill in Sports 
T-SCORES FOR BADMINTON TESTS 

These scales were constructed from data collected 
in college classes, University of Minnesota. The tests were admin- 
istered at the end of about thirty class periods of instruction, each 
period thirty-five minutes in length. This was the first season of 
badminton for the great majority. 


TABLE II 

T-Scales for Badminton Tests 



SERVE 

TEST 



CLEAR 

TEST 


Raw 

T- 

Raw 

T- 

Raw 

r- 

Raw 

T- 

Score 

Score 

Score 

Score 

Score 

Score 

Score 

Score 

70 up 

79 

40 

51 

94-96 

78 

48-49 

44 

69 

74 

39 

5° 

9*-98 

74 

46-47 

43 

68 

78 

38 

49 

90-9 » 

78 

44-45 

4* 

67 

7* 

37 

48 

88-89 

70 

4^-43 

4* 

66 

69 

36 

48 

86-87 

67 

40-41 

41 

65 

69 

35 

47 

84-85 

64 

38-39 

41 

64 

68 

34 

47 

88-83 

61 

36-37 

40 

63 

67 

33 

46 

80-81 

59 

34-35 

39 

6* 

66 

3* 

45 

78-79 

57 

3*33 

39 

61 

65 

3* 

45 

76-77 

56 

30-31 

38 





74-75 

55 

38-29 

38 

60 

63 

30 

44 

78-73 

54 

26-27 

37 

59 

68 

*9 

44 

70-71 

5* 

*4*5 

36 

58 

6a 

88 

43 

68-69 

5‘ 

82-23 

36 

57 

61 

87 

43 

66-67 

5“ 

20-81 

35 

58 

60 

s6 

4* 

64-65 

49 

18-19 

34 

55 

59 

*5 

4i 

63-63 

49 

16-17 

3* 

54 

59 

84 

4 1 

60-61 

48 

14-15 

3* 

53 

58 

*3 

40 

58-59 

47 

12-13 

3* 

5* 

58 

88 

40 

56-57 

46 

10-11 

3i 

5» 

57 

81 

39 

54-55 

46 

6- 9 

30 





5*"53 

45 

*- 5 

*7 

50 

58 

80 

39 

50-51 

45 



49 

56 

19 

38 





48 

55 

18 

38 





47 

55 

*7 

S 6 





46 

54 

16 

35 





45 

53 

15 

34 





44 

53 

14 

33 





43 

53 

*3 

3* 





48 

5* 

18 

8 l 





4» 

5 1 

11 

3 1 







IO 

*9 







5-9 

87 







1-4 

83 







56 


Better Teaching Through Testing 


BADMINTON SCALES FOR PHYSICAL 
EDUCATION MAJOR STUDENTS 

The scales below are indicative of what may be 
expected of physical education majors at the end of seven lessons, 
35 minutes in length. The subjects were 43 women, majors in 
physical education, University of Minnesota. The scores are trans- 
lated into a grading plan: 7% A, 24% B, 38% C, 24% D, 7% 
failure. 

SERVE TEST 
Range: 13-70 
56 and up A 

45-55 B 

25-44 C 

20-24 D 

19 and 

below Fd. 

BASKETBALL 

1. Half-Minute Shooting 

Equipment: No special equipment. Balls should be well 
inflated. Stop watch. 

Test: Player stands at any position he selects near the basket, 
with a ball in his hands. On the signal. 
Ready, Go! he starts shooting and continues 
until the signal to stop, attempting to make 
as many baskets as possible within the 30 sec- 
onds. If the ball has left his hands when the 
signal to stop sounds, the basket counts, if 
made. Two trials are given each player. 

Scoring: The number of baskets made in 30 seconds is the 
score for each trial. The better of the two 
trials is recorded. 

Reliability: r = .58 on trials 1 and 3 versus trial 2; stepped 
up by the Spearman-Brown formula, .68, for 
107 girls in Proviso Township High School, 
Maywood, Illinois. (Jones) 
r = .73 on 190 high school boys. (Johnson) 


CLEAR TEST 

Range: 6-86 

81 and up A 

72-80 B 

56-71 C 

3455 D 

33 and 

below Fd. 



Measurement of Skill in Sports 57 

r = .54 on first and second trials; subjects 
were 209 college freshman women, Univer- 
sity of Iowa. (Scott) 

Validity: r = .58 for the sum of three trials with a rating 
criterion. (Jones) 

.71 on 190 high school boys. (Johnson) 

.60 with a sports tests criterion, 155 college 
freshman women, University of Iowa. (Scott) 

T -scores: See page 58. 

References: 1. Johnson, L. W.: Objective Tests in Basket- 
ball for Boys. Unpublished M. A. thesis. Uni- 
versity of Iowa, 1934. 

2. Jones, Edith: A Study of Knowledge and 
Playing Ability in Basketball for High 
School Girls. Unpublished M. A. thesis, Uni- 
versity of Iowa, 1941. 

3. Scott, M. Gladys: “Assessment of Motor 
Abilities of College Women through Objec- 
tive Tests,” Resear ■ h Quarterly, 10. Oct., 
1939 - P- 63. 

Comments: This test measures the ability to hit the spot at 
which one is aiming, and also the ability to 
judge rebounds, to move quickly to get to 
the ball and to put it in play again quickly. 
This makes it a good test for all players, re- 
gardless of position. If there are as many as 
six baskets available, the test can be adminis- 
tered in a very few minutes to the usual class. 
Since no two backboards are exactly alike, the 
conditions for all can be somewhat equalized 
by having players rotate to a new basket for 
the second trial. A minimum of three trials 
is recommended for relatively inexperienced 
players. 

Wall Pass 

This test is described iu Chapter 6, Measurement of Gen- 
eral Motor Ability. 



5« 


Better Teaching Through Testing 


TABLE III 

T-Scales for Basketball Tests 


T- 

Score 

Half Balf 

Minute * Minute b 
Basket Basket 

Shooting Shooting 
U? 6 ) 1 ( 486 ) 4 

Wall 
Pass « 

O 05 ) 4 

T- 

Score 

T- 

Score 

Halt 
Minute * 
Basket 
Shooting 
(a/d) 4 

Hal} 
Minute ■> 
Basket 
Shooting 

UM) 4 

Wall 

Pass’ 

U 05 ) 4 

r- 

Score 

8 i 

18 


81 

50 



34 

5° 

8 o 



80 

49 

5 

5 


49 

79 

*5 


79 

48 




48 

77 

17 


77 

47 




47 

76 

16 

4 * 

76 

46 


4 

33 

46 

75 

14 


75 

45 




45 

74 

15 


74 

44 




44 

73 



73 

43 

4 


32 

43 

72 

>3 »4 


7* 

4 * 




4* 

7 1 

12 

4 1 

7* 

4* 


3 

3» 

4i 

70 

>3 


70 

40 




40 

69 



69 

39 




39 

68 

11 12 

40 

68 

38 




38 

67 



67 

37 

3 


30 

37 

66 

10 11 


66 

36 


2 


36 

65 


39 

65 

35 




35 

64 

9 «o 


64 

34 




34 

63 



63 

33 



*9 

33 

62 

9 


62 

3* 




32 

61 

8 

38 

61 

3* 


1 


3i 

60 



60 

30 



28 

30 

59 

8 


59 

*9 

2 



29 

58 

7 

37 

58 

28 



27 

28 

57 



57 

*7 




27 

58 

7 

36 

56 

26 




26 

55 



55 

*5 




*5 

54 

6 


54 

*4 



26 

24 

53 

6 

35 

53 

*3 




*3 

5* 



5* 

22 




22 

5 1 



5 1 

21 

1 



21 


• Scale was constructed on data obtained from girls In ninth grade in Blue Island Community High 
School, Blue Island, Illinois, and eleventh grade in Proviso Township High School, Maywood, Illinois The 
scores were on the best one out of two trials. 

11 Scale was constructed on data obtained from college students in University of Iowa and University of 
Minnesota. 


c Scale was constructed on data obtained from girls in eleventh grade in the Proviso Township High School. 
Maywood, Illinois. 

4 Number of cases in the distribution on which the scale was constructed. 





Measurement of Skill in Sports 59 

Reliability: r = .59 on trials 1 and 3 versus 2, with Spear- 
man-Brown formula, .68. (Jones) 
r = .62 on a single trial on two successive 
days, 188 college freshman women, Univer- 
sity of Iowa. (Scott) 

Validity: r = .44 for the sum of three trials with a rating 
criterion. (Jones) 

r = .62 with a sports test criterion, 155 col- 
lege freshman women, University of Iowa. 

T-scores: See Tables XI and XII. 

References: See references for Test 1. (Jones, Scott) 

Comments: This test measures the ability to pass and catch 
rapidly. It can be administered at the same 
time that the half-minute basket shooting 
test is being given, if wall space permits. 
More trials may be helpful. 


BATTERY OF BASKETBALL TESTS 

It is not known how much the two tests given here 
overlap. If time permits the administration of just one test, the 
half-minute shooting test should be given. If only two baskets are 
available, the half-minute test will take considerable time to give. 
In that case only part of the class should be assigned to tests at a 
given time, and the rest of the class can practise on other tech- 
niques in the center of the floor. If the wall pass has been given in 
the general motor ability battery, it need not be repeated at the 
beginning of the basketball season. 


FIELD HOCKEY 

1. Dribble, Dodge, Circular Tackle, and Drive 

Equipment: 1. Hockey stick for each participant; stop 
watch; one or two balls; and two high jump 
standards (or other portable post) with 
round bases. 



60 Better Teaching Through Testing 

a. Field markings 

(a) A line ao feet long to be used for a 
starting line. 

(b) A line perpendicular to the midpoint 
of the starting line and extending 35 
feet from it. This is the foul line. 

(c) A line 10 feet long, perpendicular to and 
bisected by the foul line at a point 30 



Figure ia. Field Markings and Action Sequence for Field Hockey Test z 

A-B jump standards path of ball (in dodge) 

dribble 1 1 h h- path of player (in dodge) 

drive 



Measurement of Skill in Sports 61 

feet from the starting line. This is the 
restraining line. 

(d) A line 1 foot long, perpendicular to and 
bisected by the foul line at a point 35 
feet from the starting line. 

(e) Two lines, each 1 foot long, bisecting 
each other at a point which is 45 feet 
from the starting line and in a straight 
line with the foul line. See Figure 12. 

3. Position of standards 

(a) One standard is placed so that the middle 
of the base of the standard is directly 
over the point where the foul line and 
the line described in 2(d) bisect each 
other. 

(b) The other standard is placed in a simi- 
lar fashion over the point formed by 
the two lines described in 2(e). 

Test: The player being tested shall srand behind the starting 
line with the hockey ball placed on the start- 
ing line at any point to the left of the foul 
line. At the signal. Ready, Gol the player 
shall dribble the ball forward to the left of 
and parallel to the foul line. As soon as the 
restraining line is reached, the ball shall be 
sent from the left side of the foul line to the 
right of the first obstacle (from the player’s 
point of view), and the player shall run 
around the left side of the obstacle and re- 
cover the ball. (This is analogous to the 
dodge.) Next, the player shall execute a turn 
toward her right around the second obstacle, 
still keeping control of the ball. (This is 
analogous to the circular tackle.) As soon 
as possible after that the ball shall be driven 
toward the starting line. If the drive is not, 
hard enough to reach the starting line the 
player must follow it up and hit the ball 



Better Teaching Through Testing 

again. This procedure shall be repeated un- 
til six trials have been given; players are 
alternated on trials to avoid their becoming 
fatigued. 

Scoring: The score for one trial shall be the time it takes 
from the signal “Go” until the player’s ball 
has again crossed the starting line. The score 
for the entire test is the average of the six 
trials. It is considered a foul and the trial 
does not count if: 

1. The ball or player crosses the foul line 
before reaching the restraining line. 

2. In executing the dodge, the ball is not 
sent from the left side of the foul line. 

Reliability: r = .92 when computed by the odd-even method, 
and stepped up by the Spearman-Brown 
formula. The subjects were 51 players in 
college classes and hockey club. University 
of Iowa. 

Validity: r — .44 when correlated with criteria determined 
by ratings made by three nationally rated 
umpires. 

T-scores: See page 69. 

Reference: Schmithals, Margaret and French, Esther: 

“Achievement Tests in Field Hockey for Col- 
lege Women,” Research Quarterly, 11, Oc- 
tober, 1940, p. 84. (Quoted by permission of 
the Research Quarterly.) 

Comments: This test has been used for several years, in com- 
bination with a test of knowledge, for early 
season classification, and serves adequately. 
It calls for the use of a variety of skills, the 
ability to control the ball while moving rap- 
idly, and to make quick changes of direction. 
It can be administered along the edges of the 
field and requires no special equipment, 
other than a stop watch and jumping stand- 
ards. The test authors think that the low 



Measurement of Skill in Sports 63 

validity may be partially due to the difficulty 
in making discriminations in ratings. Field 
hockey presents a very difficult rating prob- 
lem, with so many players and the small 
number of times that the ball is played by 
some players. The judges rated the players 
on two successive playing periods, which was 
not adequate to secure a wide range in 
ratings. 

Goal Shooting— Straight, Right, Left 
Equipment: 1. Target, 9 inches wide, 12 feet long and at 
least 1/2 inch thick, made of hard wood. The 
board is painted according to the following 
specifications: The length of the board is 
divided into eleven equal spaces, alternate 
space starting from either end being painted 
black and the other remaining the natural 
color of the wood. Numbers are painted in 




Figure i). Target for Hochey Test 2 


a — side view 

b — front view and scoring scheme 


the spaces in contrasting colors (black on 
light background and white on black back- 
ground) in the following order starting from 
either end: 1-2-3-4-5-6-5-4-3-2-1 (See Figure 
13). A base made of board at least 3 inches 






64 Better Teaching Through Testing 

wide, exactly 12 feet long, and at least y% 
inch thick, is nailed on the bottom of the tar- 
get so that 2 1/2 inches extend beyond the 
back of the target. The board, in order to 
stand upright securely, may be anchored 
with an ice pick or other similar device. 
Hockey stick for each participant, four to ten 
balls, stop watch. 

2. Field markings 

(a) A line 6i/£ feet long to be used as a start- 
ing line. 

(b) A rectangle, 1 1 feet long and 6i/£ feet 
wide, 15 feet from the starting line. 
Point A is the midpoint of the side oppo- 
site the starting line. 

(c) A line 1 2 feet long, called the center tar- 
get line, parallel to and 60 feet from the 
starting line. 

(d) A line 1 2 feet long, called the right inner 
target line. 

(e) A line 12 feet long, called the left inner 
target line. 

(f) The target is placed directly on the speci- 
fied line with the numbers facing the 
starting line and the board anchored 
with ice picks. For the straight drive, it is 
placed on the center target line, for the 
drive from right and left inners’ posi- 
tions, the right and left inners’ target 
lines, respectively. (See Figure 14) 

Test: 1. Drive from center’s position. The player being 
tested shall stand behind the starting line 
with the hockey ball placed on the starting 
line. At the signal, Ready, Go! the ball shall 
be dribbled to the rectangle, from within 
which area it must be driven toward the 
board (placed on the center target line). The 



Measurement of Skill in Sports 65 

procedure shall be repeated until ten trials 
have been given. 

2. Drive from right inner’s position. The same pro- 

cedure shall be repeated, the only difference 
being the position of the target, which is 
placed on the right inner target line. 

3. Drive from left inner’s position. The same proce- 

dure shall be repeated, the only difference 
being the change in position of the target 
to the left inner target line. 


12' 



l 


6 ' 6 " 

Figure 14. Field Markings for Hockey Test 2 

A — center of front line of rectangle, from which target lines are measured 
R — right inner target line 
C — center target line 
L — left inner target line 

Scoring: The score for one trial shall constitute the time 
elapsing from the timer’s signal, Go! until 
the ball strikes the target. The score for the 




66 Better Teaching Through Testing 

entire test is the sum of the first and second 
best odd and first and second best even num- 
bered trials made on the center drive, the 
right inner drive, and the left inner drive. 
The score shall count if the ball bounces 
over the top of the target; in this case, the 
time shall be taken until the instant the ball 
clears the target. The score shall be zero if 
the ball is not driven from within the rec- 
tangle, or if the driven ball fails to reach 
the target or misses it at either end. The at- 
tempt shall not be counted as a trial if 
“sticks” are made, or if the player raises the 
ball so that it fails to touch the ground be- 
fore passing above the target. 

Reliability: r = .92 when the Spearman- Brown Prophecy 
formula is applied. The subjects are the 
same as for Test x. 

Validity: r = .48 with the same criterion and for the same 
group as Test 1 . 

T-scores: See page 69. 

Reference: Same as for Test 1. 

Commejits: This test is designed to measure the ability of 
the player to adjust footwork, to judge space 
while moving, and to drive with accuracy 
and force while running. It is an excellent 
practice device, and can be divided into 
parts, administering just one (goal shooting 
left), if time is limited. The reliability on 
goal shooting left is adequate (.87) and the 
validity is substantially the same as for the 
entire test (.44). Note that the accuracy 
score is used to make the situation some- 
what analogous to the game when the player 
has to decide how much speed she can afford 
to sacrifice for the sake of accuracy. The 
player is given two scores for this test: a speed 
score, and an accuracy score. The time score 



Measurement of Skill in Sports 67 

is the only one that actually is used in com- 
bining this test with others in a battery. 

3. Fielding and Drive 

Equipment: 1. Hockey stick for each participant, three to 
seven balls, two ice picks with brightly col- 
ored tops, regulation hockey goal, stop watch. 

2. Field markings (see Figure 15) 

(a) Goal line extending across the area be- 
tween goal posts. 

(b) Foul line, 12 feet long, parallel to and 
1 o feet from the goal line, located directly 
in front of the goal area. The ice picks 
are placed on the foul line at points di- 
rectly opposite each goal post. 

(c) Restraining line, 30 feet long, parallel 
to and 10 feet from the foul line. 

(d) Regulation striking circle in front of 
the goal. 


A 



B 


Figure i$. Field Markings for Hockey Test 3 

player in position to start the test 
examiner in position to roll ball 
rolled ball 
stop 
tap 
drive 

Test: The player being tested shall stand behind the goal 
line. The examiner shall stand at the edge of 
the striking circle directly in front of the goal 



68 Better Teaching Through Testing 

with a hockey ball in one hand and a stop 
watch in the other. At the examiner’s signal. 
Ready, Go I the hockey ball is rolled toward 
the goal. Simultaneously, the player shall run 
forward and attempt to field the ball before 
it reaches the foul line, tap it once, and drive 
it out of the striking circle from within the 
area between the restraining line and the 
foul line. This procedure shall be repeated 
until sixteen trials have been given. Instruc- 
tions to examiner: The ball should be rolled 
as uniformly as possible for all trials and for 
all players. Through a little practice a roll 
may be achieved which is approximately 45 
feet in 1 .7 seconds. 

Scoring: The score for one trial is the time from the instant 
the player first touches the hockey ball to the 
instant the ball reaches the edge of the strik- 
ing circle. The score on the entire test is the 
sum of the average of the three best even and 
the three best odd numbered scores of the 
sixteen trials. The attempt does not count as 
a trial if the rolled ball does not pass between 
the two ice picks, or if not delivered at ap- 
proximately the designated speed. The 
player receives a zero score on a trial if the 
ball is advanced illegally, or if it rolls wholly 
over the foul line before or after it is 
touched by the player's stick. The zero score is 
assigned, too, if the ball is not driven out of 
the striking circle from within the area 
bounded by the restraining line and the foul 
line, or if the ball is not controlled; that is, 
stopped, and tapped, before being driven. 

Reliability: r = .90 by the odd-even method, stepped up by 
the Spearman-Brown formula. The subjects 
were the same as for Test 1. 



Measurement of Skill in Sports 69 

Validity: r = .48 with the same criterion as described for 
Test 1. 

T -scores: See below. 

Reference: Same as for Test 1. 

Comments: The effect on the score of having the ball put in 
play by another person than the one being 
tested appears to be very slight, as the pro- 
cedure is well standardized. This test is 
costly in time to administer. 

BATTERY OF FIELD HOCKEY TESTS 

The best single test for classification of college stu- 
dents is the dribble, dodge, circular tackle, and drive. The best 
combination of two tests, as determined by multiple correlations, 
is the goal shooting left (one portion of test two) and the fielding 
and drive test. They measure quite different things as is indicated 
by the low intercorrelation of .22. The two tests together yielded 
a correlation coefficient of .60. The addition of Test 1 to this bat- 
tery raises the correlation only slightly (.62). The formula for 
combining scores on the two tests (goal shooting left and the fielding 
and drive) to insure proper weighting is as follows: 

l. goal shooting left -f- 1.2 fielding and drive. 

TABLE IV 


T-Scales for Field Hockey Tests 

T- 

Score 

Dribble, Dodge,' 
Circular Tackle, 
and Drive 

Goal » 
Shooting 

Fielding b 
and Drive 

T- 

Score 

73 


ig.4-19.6 

4.1 

73 

7 1 

10.1-10.5 

19.7-S1.4 

4-3 

7 i 

69 




69 

68 


81.5-21.7 

4.4 

68 

67 

10.6-11.0 



67 

66 




66 

65 

11.1-11.5 


4-5 

65 

64 


81 . 8 - 22.0 

4.6 

64 

63 




63 

6s 



4-7 

62 

61 

11.6-18.0 

28.1-88.3 

4.8 

61 


70 


Better Teaching Through Testing 


TABLE IV (Continued) 


T-Scales for Field Hockey Tests 

Dribble , Dodge, 1 
T- Circular Tackle, 
Score and Drive 

Goal 

Shooting 

Fielding 
and Drive 

T- 

Score 

60 


22.4-22.6 


60 

59 


22.7-22.9 

4-9 

59 

58 

12.1-12.5 

23.0-23.2 


58 

57 


23.3-23.8 

5.0 

57 

56 

12.6-15.0 

23.9-24.1 

5 -» 

56 

55 


24.2-25.0 

5 * 

55 

54 

13-1-13-5 

25-1-25.3 

5-4 

54 

53 


25.4-25.6 

5.6 

53 

5 * 

13.6-14.0 


5-8 

5 * 

5 1 


25 - 7 - 25-9 

5-9 

51 

50 

14.1-14.5 

26.0-26.5 

6.0 

50 

49 


26.6-26.8 

6.1 

49 

48 

14.6-15.0 

26.9-27.1 

6.8 

48 

47 

15115-5 

27.8-27.4 

6.4 

47 

46 


27.5-28.0 

6.5 

46 

45 

15.6-16.0 

28.1-28.3 


45 

44 




44 

43 

16.1-16.5 

28.4-28.6 

6.6 

43 

4 » 

16.6-17.0 

28.7-28.9 


4 * 

4 i 


*9-0-29.5 

6.7 

4 » 

40 

17.1-17.5 

29.6-29.8 

6.9 

40 

39 


* 9 - 9 * 3°-7 

7 * 

39 

38 

17.6-18.0 

30.8-31.0 

7-3 

38 

37 

18.1-18.5 

31.1-31.9 


37 

36 

18.6-ig.o 

32.0-32.8 

7-4 

36 

35 



7.6 

35 

34 

19 -I- 19-5 

3 *- 9 - 33 -> 


34 

33 

19.6-20.0 


7-7 

33 

3 * 


33 - 2 - 33-4 

8.0 

3 * 

3 i 

20.1-20.5 

33 - 5 - 33-7 

9 -o 

3 * 

30 




30 

*9 

20.6-21.0 

33 - 8 - 34-3 

9-2 

*9 

28 




28 

27 

21.1-24.5 

34 - 4 - 34-6 

10.7 

*7 

*5 

24.6-30.0 



*5 


• This scale was constructed on data collected on 8t students, Illinois State Normal University. Avera at 
of three trials was used. ' " 

b These scales were constructed on data collected on 51 students. University of Iowa. 





SOCCER 


Measurement of Skill in Sports 


7 1 


i. Volleying 

Equipment: 1. One soccer ball, fully inflated; one stop 
watch; one unobstructed wall space 15 feet 
long and 10 or more feet high. 

2. Markings 

(a) A target shall be outlined on the wall 
which is 15 feet wide, beginning 8 inches 
above the floor and continuing upward 
to at least 1 0 feet from the floor. 

(b) A starting line, 2 feet in length, shall be 
drawn on the floor, parallel to the wall 
and 15 feet from the midpoint of the out- 
lined wall target. 

(c) An area shall be outlined on the floor 
which is 30 feet square, and it shall have 
as one side of the square the wall space 
upon which the target is marked, so that 
their midpoints coincide. See Figure 16. 

T est: Place the ball on the starting line. At the signal, Ready, 
Go! kick the ball against the wall so that it 
hits within the outlined target; when the 
ball rebounds, recover it and kick again. 
Continue this as rapidly as possible. Your 
score is the number of times the ball strikes 
the wall within the target in one minute. You 
may kick, volley, or use any technique which 
is legal in a regulation soccer game. After the 
first kick, the ball may be played from the 
point of recovery (provided that this point 
is within the 30-foot square on the floor) or 
you may dribble the ball to a more advan- 
tageous position. If the ball bounds outside 
the 30-foot square, it will be stopped by one 
of the assistants and placed on the 30-foot 
square boundary line, where you may re- 
cover it and dribble or kick it again. 



7 « 


Better Teaching Through Testing 




Figure 16. Wall and Floor Markings for Soccer Volleying Tests 

a — wall target 
b — floor diagram 

L — center of floor area and wall area (the two should coincide) 

© — assistants to recover bails going out of the area 

Note to the test administrator: Station six 
assistants, two on each o£ the free sides of the 
30-foot square as marked on the diagram by 
O’s. These assistants are to stop any ball 
which is about to leave the square and place 
it, with their hands, on the boundary line 
at the point where it crossed the line. 



Measurement of Skill in Sports 73 

Practice: Allow each player one practice 
trial of one-half minute length. 

Scoring: Two trials shall be given each player, each on a 
separate day. Not more than a week should 
elapse between trials. The better of the 
two trials shall count. One point is scored 
each time the ball strikes within the correct 
area. Balls hitting lines outlining the target 
are considered good. 

Reliability: The coefficient was .67 for first and second trials. 

The subjects were 84 ninth and tenth grade 
girls, Fairview High School, Fairview Vil- 
lage, Ohio. The tests were given at the end 
of the season. 

Validity: The validity coefficient of this test was found to be 
.57 when correlated with the criterion of 
subjective ratings, established by a board of 
judges consisting of three instructors of soc- 
cer and two senior students. Another crite- 
rion was established by T-scoring the tests 
and adding the T-scores made on all tests. 
When correlated with this criterion, the co- 
efficient of the volleying test was .77. When 
such a criterion is used the coefficient is al- 
ways higher because the test being studied is 
a part of the total score used for the criterion. 

T-scores: See page 83. 

Reference: Schaufele, Evelyn F.: The Establishment of Ob- 
jective Tests for Girls of the Ninth and 
Tenth Grades to Determine Soccer Ability, 
Unpublished M.A. thesis, University of 
Iowa, 1940. 

Comments: The test appears to measure ability to control 
the ball, judgment of speed and direction, 
and skill in maneuvering the ball, all of 
which are essential skills in the game itself. 
It is easy to administer and is interesting to 
students. An increase in the number of trials 



74 Better Teaching Through Testing 

appears to be indicated. Four should be suffi- 
cient. 

2. Passing and Receiving 

Equipment: 1. One soccer ball, fully inflated; one unob- 
structed wall space 55 feet in length; one 
stop watch. 

2. Markings 

(a) A restraining line shall be drawn 8 feet 
from the wall and parallel to it. 

(b) Two boundary lines, perpendicular to 
the wall and 12 feet in length, shall be 
drawn at the end of each restraining 
line, extending out from the wall. These 
lines shall be 55 feet apart. See Figure 17. 

Test: Place the ball at the point where the restraining line 
crosses the boundary line, with the wall on 
your left (indicated by A in the diagram). 
On the signal, Go! dribble the ball forward 
a few feet and then pass it with the side of the 
foot so that it strikes the wall and rebounds. 
Run forward to meet it as it rebounds, 
and repeat. You should wait for the ball to 
cross the restraining line, as you may not 
make a pass from inside the area. However, if 
the ball has not been sent with enough force, 
you may go in to recover it and dribble it 
outside the restraining line. You have ten 
seconds to complete the distance, during 
which time you are to make three passes 
against the wall, if possible. Your score will 
be the number of successful passes and re- 
coveries you make. Your third pass and re- 
covery will not count unless you have 
touched the ball beyond the finish line be- 
fore the time limit. If you recover the ball 
before you reach the finish line, try to drib- 
ble it up to the line before time is up. The 





76 Better Teaching Through Testing 

ball must be touched at least twice between 
each recovery and the next pass. 

Practices: Each player is given one prac- 
tice trial before the test begins. 

Trials: The test consists of two trials from 
each end, the first two from A with the wall 
to the left, and the next two from B with the 
wall to the right. 

Scoring: The score shall be the sum of the number of suc- 
cessful passes and recoveries made in the four 
trials. A successful pass and recovery shall 
be one in which the ball is kicked against the 
wall from outside the restraining line and 
first touched again with the foot outside the 
restraining line after the rebound. 

Reliability: r = .56 by the odd-even method, when corrected 
by the Spearman-Brown formula, .72. The 
subjects were the same as for Test 1. 

Validity: r = .50 with the subjective criterion, .68 with the 
total test criterion described for Test 1. 

T-scores: See page 83. 

Reference: Same as for Test 1 . 

Comments: This is a test involving difficult skills. The re- 
liability and validity would both probably 
be higher for more experienced players. Fur- 
ther experimentation needs to be done on 
distances, length of time, number of trials, 
and perhaps even on amount of practice per- 
mitted. It is an excellent practice device. 

3. Judgment in Passing 

Equipment: 1 . One or more soccer balls, fully inflated; one 
stop watch; one bench six feet in length; one 
regulation soccer goal. 

2. Markings 

(a) Beginning from a point on the goal line 
and 4 feet inside the goal post, draw a 
line perpendicular to the goal line and 



Measurement of Skill in Sports 77 

15 yards long. This is the restraining 
line. 

(b) Extend the 12-yard line (regulation soc- 
cer marking) to a length equivalent to 
the length of the distance between the 
goal posts, and parallel to the goal line. 
This is the starting line. 



Figure 18. Field Markings for Soccer Judgment in Passing Test 

This illustrates the restraining line and position of the bench for a kick from the 
right. The bench would be moved to the left and a restraining line marked on 
the left for the kick from the left. 


(c) The 6-yard line, on all regulation fields, 
is also used in the test and is referred to 
as the 6-yard line. 

(d) The bench is placed 4 feet from the goal, 
parallel to it, and in such a position 


Better Teaching Through Testing 

that one end is exactly at the center of 
the goal and the other extends toward 
the right side as you face the goal from 
the starting line. 

(e) For the second part of the test, the above 
directions shall be reversed, and the only 
additional line needed is a restraining 
line, four feet inside the left hand goal 
post and perpendicular to the goal line, 
15 yards long. (See Figure 18 for test 
from the right hand side.) 

Test: Place the ball on the 12-yard line, outside the restrain- 
ing line. On the signal, Ready, Gol dribble 
the ball forward, keeping to the right of the 
restraining line. After you have passed the 
6-yard line, kick for goal when you think you 
are in the most advantageous position. You 
may use any kick you like, but the ball must 
be kicked from outside the restraining line 
and it must enter the goal between the left 
end of the bench and the left goal post. Each 
trial must be completed in four seconds, from 
the word "Go” until the ball is kicked. If you 
take longer or violate any of the other rules, 
the trial will have to be repeated. You are 
given five trials from the right side, then five 
from the left. 

Instructions to test administrator: For 
speed in administering the test, three or more 
balls should be made available. Have each 
girl recover her own ball, carry it to the side 
of the field, and roll it to the person nearest 
the head of the line who does not have a ball. 
The entire test can be administered to as 
many as forty girls in forty minutes. Only 
five or six girls should be assigned to this 
test at one time. 



Measurement of Skill in Sports 79 

Scoring: One point is given for each goal scored in a legal 
trial. Five legal trials are given as described 
above, and then the bench is set up on the 
other side of the goal and the entire test is 
repeated on that side. The score is the total 
for the ten trials. 

Reliability: r = .69 when computed by the odd-even method. 

When the Spearman-Brown formula was ap- 
plied, the reliability coefficient was raised to 
.82. The subjects were the same as for Test 1 . 

Validity: r = .34 with the subjective criterion; .65 with the 
total test criterion as described for Test 1. 

T-scores: See page 83. 

Reference: Same as Test 1. 

Comments: This test is easily and quickly administered. The 
low validity figure is perhaps explained by 
the fact that judgment in placement and 
timing of passes is difficult to rate in an actual 
game situation. 

Combination of Skills Test 

Equipment: 1. Soccer balls, fully inflated; 2 goal posts or 
standards; one stop watch; playing surface 
55 yards in length; and a wall space 12 feet 
long and 30 inches high. (Three locker room 
benches can be placed on their sides and 
stacked to present a smooth surface upon 
which the correct area can be painted with 
show card paint. They should be braced and 
stacked in place, so that there are no projec- 
tions and so as to permit very little vibra- 
tion.) 

2. Field markings 

(a) The soccer goal is 1 8 feet in width. Oppo- 
site the goal, mark a line 6 feet in length, 
parallel to the goal line and 55 yards from 
it. The center of this line should be di- 
rectly opposite the center of the line be- 
tween the goal posts. This is the starting 
line. 



80 Better Teaching Through Testing 

(b) Mark another line, to be used as a re- 
straining line, extending from the left 
end of the starting line (as you face the 
goal) 25 yards toward the goal line. 

(c) The “wall” is placed 12 feet to the left 
of this restraining line, so that the center 
of the marked target is directly opposite 
the end of the 25-yard line. 

(d) A 6-yard line, parallel to the end line, 
is also used. 

(e) The description above is for testing with 
the right foot. When testing with the left 
foot, the markings differ as shown in Fig- 
ure 19. If the wall is stationary, new 
markings with the goal line as the starting 
point and the starting point as the goal 
line should be made. 

Test: The ball is placed on the starting line. On the signal, 
Ready, Go! the ball is dribbled and kept 
to the right of the restraining line until 
you are at a point where you can kick the 
ball with your right foot (as though you 
were sending a pass to another person) 
and hit the wall. After the pass, run ahead 
to recover the ball as it rebounds and con- 
tinue dribbling toward the goal until you 
are close enough to kick for goal. You 
must kick for goal before you cross the 
6-yard line. Go as fast as you can without 
losing control of the ball. You are given 
four trials, with the wall to your left 
making the pass with the right foot; then 
there are four more trials, with the wall 
on your right, passing with the left foot. 

Note to test administrators: Allow two 
practice trials with each foot. Record 
actual trials in half seconds. Trials which 
include errors must be repeated. 



Measurement of Skill in Sports 81 

Scoring: Your score will be the total amount of time it 
takes you to complete the eight trials, 
each one timed separately. 



a b 

Figure 19. Field Markings for Soccer Combination of Skill Tests 

a — markings lor kick with the right foot 
b — markings for kick with the left foot 
||-||-goal posts 

Reliability: r = .93 computed by the odd-even method, 
when corrected by the Spearman-Brown 
formula, .96. The subjects were 124 fifth 
and sixth grade children from public 



82 Better Teaching Through Testing 

schools in Des Moines, Iowa, and in Web- 
ster Groves, Missouri. 

Validity: For the 92 cases from Webster Groves, r = .92 with 
a subjective rating criterion. For the 32 
cases from Des Moines, r = .53 with sub- 
jective ratings. 

Scores: The range for fifth grade, 95.5 to 226.5 seconds; me- 
dian 139.2. 

The range for sixth grade, 90.0 to 191.0 
seconds; median 130.3. 

The range for girls only, 102.5 to 2*6.5 
seconds; median, 143.4. 

The range for boys only, 90.0 to 198.0 
seconds; median 126.7. 

T-scores: See page 83. 

Reference: Bontz, Jean: An Experiment in the Construction 
of a Test for Measuring Ability in Some 
of the Fundamental Skills Used by Fifth 
and Sixth Grade Children in Soccer, Un- 
published M.A. thesis. University of 
Iowa, 1942. 

Comments: This test has proven interesting to players. If 
a stationary wall, such as a school build- 
ing, is adjacent to the playing field, it 
can be practiced outside of class time. 
Such an arrangement leaves the playing 
field free. It has been used with high 
school and college ages, but data are not 
available on the older groups. 

BATTERY OF SOCCER TESTS 

The one best test appears to be volleying. A good 
two-item battery of tests is the combination of passing and receiv- 
ing with volleying, which Schaufele found to give a coefficient of 
.85 with the total test criterion, or .63 with subjective ratings. The 
intercorrelation of these two tests is .46. The best weighting of the 
scores is: 

2. passing and receiving -j- 1. volleying 



Measurement of Skill in Sports 83 

An equally good two-item battery is the combination of judgment 
in passing with volleying, which give a coefficient of .85 with the 
total test criterion (subjective estimate figures not given). The 
intercorrelation of these two tests is .41. The recommended 
weighting of scores is: 

2. judgment in passing 1. volleying 

The combination of the three tests gives a coefficient of .89, with 
the total test scores criterion. The correct weighting is: 

2- passing and receiving -j- 1. volleying -f- . judgment in passing 


TABLE V 

T-Scales for Soccer Tests 


r- 

Scores 

Volleying • 
(»«)• 

Passing 
and Receiv- 
ing * (84) « 

Judgment 
in Pass- 
ing'’ (84) • 

T- 

Scoret 

r- 

Scores 

Volleying * 
(*53>* 

Passing 
and Receiv- 
ing'’ (84)* 

Judgment 
in Pass- 
ing'' (84) c 

T- 

Scores 

79 

41 



79 

49 

11 



49 

77 

*5 



77 

46 

10 

6 

3 

46 

74 

*4 



74 

45 




45 

7* 

*3 



7* 

43 

9 



48 

71 

is 


7 

7i 

4» 


5 


4» 

70 

si 

IO 


70 

40 

8 


s 

40 

69 

so 



69 

37 

7 



37 

67 

19 


6 

67 

36 


4 


36 

66 

18 



66 

S3 

6 



33 

63 

‘7 

9 


63 

S 1 

5 


1 

3i 

6* 



5 

6s 

30 




3<> 

61 

l6 



6l 











*9 


3 


*9 

59 

»5 



59 

s8 

4 



s8 

57 

>4 

8 


57 

*5 

3 

s 


*5 

55 

13 



55 

si 

s 



21 

53 

1* 



53 






5* 


7 

4 

5 i 







■ Seal* constructed on date collected on ninth and tenth grade girls, Fairview High School, Fairview Village, 
Ohio; and ninth grade girls In Blue Island Community High School, Blue Island, Illinois. 

6 Scale constructed on data collected on ninth and tenth grade girls, Fairview High School, Fairview Village, 
Ohio. 

• Indicates the number of subjects included in the scale. 







84 Better Teaching Through Testing 

SOFTBALL 

1. Repeated Throws 

Equipment: 1 . A number of new balls; a flat, unobstructed 
wall surface about 1 5 feet or more high and 
8 feet wide; a stop watch. 

2. Markings 

Draw a line on the wall 7^ feet from the 
floor. Draw a restraining line on the floor 
1 5 feet from the wall, and parallel to it. 

Test: The player being tested stands any place behind the 
restraining line and facing the wall. On the 
signal, Ready, Go I the player throws the ball 
against the wall so that it hits above the 7 y 2 - 
foot line, catches it, and repeats this as many 
times as he can in 30 seconds. One ball is 
used throughout the test; if it gets out of con- 
trol, it must be recovered by the player being 
tested. (The loss of time is considered suffi- 
cient penalty.) Foot faults (stepping on or 
over the line) are watched by the scorer and 
the player is told to move back. Any throws 
made while the player is on or over the line 
do not count. A rest of two minutes is recom- 
mended between trials; this can be easily ad- 
ministered if three or four girls take throws 
at the same target. Six trials are given. 

Scoring: One point is counted each time the ball hits on or 
above the 7 14 -foot line, provided the throw 
was made when the player was behind the re- 
straining line. The score for the entire test is 
the total of six trials of 30 seconds each. 

Reliability: r = .89 by the odd-even method, .94 corrected 
by the Spearman-Brown formula; subjects 
were 66 college women. Underkofler found 
slightly lower coefficients on a similar test for 
junior high school girls. The 14-inch ball 
was used, the line on the wall was 10 feet high 



Measurement of Skill in Sports 85 

and the restraining line was 10 feet from 
the wall. In this case the r was .73 for first 
and second trials, stepped up to .84 because 
the sum of the two trials was used as the 
score. 

Validity: In the junior high school study cited above, the r 
with a subjective rating criterion was .64. 
In another study this test was given to 173 
college women, in various institutions lo- 
cated in the central states. The coefficient 
was .51 with a subjective rating criterion. 
The comparatively low validity is doubtless 
partially explained by the fact that the same 
persons did not make all of the ratings. (See 
reference 1 below.) 

T -scores: See page 88. 

References: 1. Thomas, Jesselene: “Skill Tests,” Official 
Softball-Volley Ball Guide, 1943, pp. 21-27. 
(Report of a project of the Research Com- 
mittee of the Central Association of Physical 
Education for College Women, Aileen Car- 
penter, Chairman.) (Quoted by permission 
of A. S. Barnes & Co.) 

2. Underkofler, Audrey: A Study of Skill Tests 
for Evaluating the Ability of Junior High 
School Girls in Softball. Unpublished M.A. 
thesis, University of Iowa, 1942. 

Comments: This test is highly reliable but does not differ- 
entiate clearly between students in the 
middle ability group. The better players will 
be able to throw the ball with sufficient force 
so that it rebounds to them without bounc- 
ing. To a certain extent it measures accuracy 
as well as power. It correlates highly with the 
distance throw (intercorrelation .81) so 
there is no need to give both tests. It is easy 
to administer and can be given indoors on a 
rainy day. If you have sufficient wall space for 



86 Better Teaching Through Testing 

ten targets, the entire test can be adminis- 
tered to a class of forty girls in fifteen min- 
utes. 

2. Distance Throw 

Equipment: A number of regulation softballs, and a field. 
(See Figure 5 for field markings.) 

Test: The player stands behind the line and throws the 
ball as far as possible with an overhand or 
sidearm motion. The player is limited to 
one step, which must be taken behind and 
not over the line. Three throws constitute 
one trial, and only the best throw of the three 
is measured and recorded. Three trials are 
permitted (9 throws in all). 

Scoring: The throw is measured as the distance in feet from 
the starting line to the spot where the ball 
first touches the ground. The best of the 
three recorded throws is used as the player’s 
score. 

Reliability: r = .95, computed on successive trials. The 
subjects were 1 1 8 girls in the seventh and 
eighth grades in the Intermediate School, 
Riverside, Illinois. 

Validity: r = .63 with ratings, subjects were college girls 
(same group as for Test 1). 
r == .81 with ratings, subjects were 118 sev- 
enth and eighth grade girls. 

T-scores: See page 88. 

References: Same as for Test 1. 

Comments: Ability to throw the ball long distances is im- 
portant in softball, and since there is a rela- 
tionship between the distance that the ball 
can be thrown and the ability to throw the 
ball with speed, it seems all the more impor- 
tant that this skill be measured. This appears 
to be the best single test yet devised for soft- 
ball playing ability. 



Measurement of Skill in Sports 87 

Batting 

(Note: The study of the test described below was made 
using a fourteen-inch softball, rather than the regulation 
twelve-inch ball. The pitching distance was 30 feet 
rather than the official 35 feet for girls. The 45-foot dia- 
mond was used, rather than the 6o-foot. These points 
should be kept in mind in interpreting results.) 

Equipment: 1. Several fourteen-inch balls, and regulation 
softball bats. 

2. Markings 

A 45-foot diamond, with infield marked; 30- 
foot pitcher’s distance. 

Test: The contestant stands in the batter’s box and is given 
10 trials to hit legal pitches which are called 
strikes by an umpire. “Balls” are disre- 
garded. The effect of the personal element is 
held somewhat constant by using the same 
pitcher and umpire throughout the adminis- 
tration of the test. 

Score: Five points for each outfield hit, 3 points for an in- 
field hit, 1 point for a foul, and o points for 
a ball struck at and missed or for a called 
strike. The score is the total number of 
points earned in the ten trials. 

Reliability: r = .65 by the odd-even method, raised to .79 
with the Spearman-Brown correction. The 
subjects were 118 girls in seventh and eighth 
grades. 

Validity: The validity coefficient of this test was found to 
be .72 for the group described above. The 
criterion was a subjective rating, made by 
two students in each class, plus two times the 
rating score given by the instructor. 

T-scores: See page 88. 

Reference: Underkofler. (Same as for Test 1.) 

Comments: Since batting comprises such a large share of 
the offensive play in softball, no battery of 
tests seems complete without it. This is the 



88 Better Teaching Through Testing 

first test that presents any statistical data. It 
is recognized that even though the same 
pitcher is used that the conditions are not 
the same for all batters as that pitcher will 
vary in his deliveries. Before much confidence 
can be placed in the test, further study is 
desirable with the test administered with 
official balls, pitching distance, and diamond. 
The use of a nationally rated softball umpire 
would also be helpful. 

BATTERY OF SOFTBALL TESTS 

The best single test appears to be the distance throw. 
The intercorrelation between Underkofler’s batting test and her 
distance throw is relatively high (.63). When the two are combined 

TABLE VI 

T-Scales for Softball Tests 


T- 

Score 

Repeated , * 
Throws 
(//«)* 

Repeated 11 
Throws 
( 66 )' 

Distance ” 
Throw 

Batting * 
<"*)* 

T - 

Score 

76 

37 


*S» 

4 * 

76 

75 



1*9 


75 

74 



187 


74 

73 



126 


73 

7 * 


116-117 

!84 

4 ° 

72 

7 1 



1*3 


7 » 

70 

36 


181 


70 

69 



180 


89 

68 



ll8 

38 

68 

67 

35 


**7 


67 

66 


114-115 

ll 5 

36 

66 

65 



114 

35 

65 

64 

34 


118 


64 

63 



110 


®3 

6s 

33 

ns-113 

109 

34 

68 

61 

3 * 

110-111 

107 


61 

60 


108-109 

106 

3 * 

60 

59 



104 

30 

59 

58 


106-107 

103 

*9 

58 

57 

S 1 


IOI 

28 

57 

56 


104-105 

100 

27 

56 




Measurement of Skill in Sports 


89 


TABLE VI (Continued) 

T -Scales for Softball Tests 


T- 

Score 

Repeated * 
Throws 
( 1 18 )* 

Repeated * 
Throws 
( 66 ) ‘ 

Distance * 
Throw 

Batting * 

T- 

Score 

55 

30 

100-103 

98 

*6 

55 

54 


98-99 

96 

*4 

54 

53 

*9 

96-97 

95 

*3 

53 

5* 



93 

aa 

5* 

5 1 


94-95 

98 


5* 

50 

5(8 

9*-98 

90 

81 

50 

49 


9°V l 

87 

SO 

49 

48 


88-89 

85 

*9 

48 

47 

*7 

86-87 

8a 


47 

46 


88-85 

80 

18 

46 

45 

*6 


78 

17 

45 

44 

*5 

80-81 

75 

l6 

44 

43 


76-79 

73 


43 

4* 

*3 

74-75 

70 

»5 

4® 

4» 

as 

7*73 

68 

>4 

4i 

40 

si 

70-71 

65 


40 

39 

>9 


63 

13 

39 

38 

18 


60 

18 

38 

37 

l6 

68-69 

58 

11 

37 

36 

>4 

66-67 

55 

9 

36 

35 

>3 

64-65 

53 

8 

35 

34 

IO 


50 


34 

33 


6a-6g 

48 


33 

3* 

9 


46 

6 

3* 

31 

7 

60-61 

43 


Si 

3® 


58-59 

41 


30 

*9 



40 


®9 

*8 

4 



1 

28 

*7 





87 

*6 


56-57 



a6 

*5 





*5 

*4 

a 




*4 


a Scale constructed on data collected on seventh and eighth grade girls In Riverside School, Riverside, 
Illinois. For form of the tests see discussion on reliability of Test r.) Score is the sum of two trials. 

b Scale constructed on data obtained on college women at University of Iowa, and University of Minne- 
sota. Score is sum of six trials. 

* Scale constructed on data obtained cm college women. (Quoted by permission of A. S. Barnes ft Co.) 
d Indicates number of cases represented in the scale. 



go Better Teaching Through Testing 

the correlation is not high enough to warrant giving the second 
test. Since the distance throw and repeated throws have been shown 
to measure quite similar abilities, no .combination is possible there. 

Various base running tests have been tried but it seems to be 
a relatively unimportant skill to include in a limited battery. It is 
known that higher validities are obtained when the test includes 
running from home plate to second base than in running to first 
base only or in running all four bases. Perhaps the best battery to 
recommend at present is a combination of ratings of batting skill 
(see Chapter 7) with either the distance throw or repeated throws. 


SPEEDBALL 
1. Lift to Others 

Equipment: 1. Soccer balls, net (volley ball, badminton, 
or tennis), and standards to hold the net at 
214 foot height. 


00000 

1 1 ■ — 1 - ■ — - - 



Figure ao. Field Markings for Speedball Lift to Others Test 
© players in position ready to take the test 


2. Markings 

One line on each side of the net, parallel to 
the net and 6 feet from it, and extending the 
length of the net. Three-foot squares must 
be drawn at 3-foot intervals all along both 
sides of the net. (See Figure 20.) 




Measurement of Skill in Sports 91 

Test: Each player stands behind the 6-foot line directly 
behind a 3-foot square, with a soccer ball 
placed on the line. The player then lifts the 
ball with either foot, attempting to send the 
ball across the net and into the square diago- 
nally opposite him and to his right. The 
player on the opposite side recovers the ball 
and lifts the ball back to the 3-foot square 
diagonally to his right and across the net. If 
the ball touches the net and goes over and 
lands within the designated area, the score is 
counted. Five trials to the right and five to 
the left are given. Partners on opposite sides 
of the net keep each other’s scores and report 
to student scorers. 

Scoring: The total number of correct lifts out of ten trials 
is the score, each accurate and successful lift 
counting one point. 

Reliability: The reliability, by correlating odd and even 
trials, was .87. When the Spearman-Brown 
Prophecy Formula was applied, to step the 
test up to actual length, the reliability was 
.93. The subjects were 72 high school girls, 
Parsons, Kansas. The tests were given at the 
end of the season. 

Validity: The validity of this test was found to be .88 for 
the group described above. The criterion 
was subjective ratings, by three teachers of 
speedball playing ability. 

T-scores: See page 99. 

Reference: Buchanan, Ruth E.: A Study of Achievement 
Tests in Speedball for High School Girls. 
Unpublished M.A. thesis, University of 
Iowa, 1942. 

Comments: The lift of a stationary ball in speedball is used 
frequently to begin the game and to start 
play after each violation. Thus, it would 
seem to be an important enough skill to be 



92 Better Teaching Through Testing 

included in most test batteries. Unless play- 
ers have practiced this skill a great deal, and 
are practically at the peak of their per- 
formance ability, one could not expect such 
high reliability. 

2. Throwing and Catching While Standing 

Equipment: 1 . Soccer balls, wall space, stop watch. 

2 . Markings 

Line drawn 6 feet from the wall and parallel 
with it, extending the length of an uninter- 
rupted wall space. 



Oi players in position ready to take test 
©, players in position ready to score test 
©„ © 4 players in position waiting their turns at the test 


Test: Each player stands behind the 6-foot line with a soccer 
ball in his hands. The scorer for each player 
is stationed to the rear and slightly to one 
side. At a signal, the player begins by throw- 
ing the ball to the wall, continues by catch- 



Measurement of Skill in Sports 93 

mg and repeating. Each trial is 30 seconds in 
length. Each player is given 5 trials, but 
these trials should not be in succession. 

Scoring: Each throw made from behind the line counts one 
point. The total for each trial is recorded 
and the score for the test is the average for 
the five trials. 

Reliability: By the odd-even method the coefficient was .85. 

When corrected by the Spearman-Brown 
Prophecy Formula, the reliability was .93. 
The subjects were the same as for Test 1. 

Validity: The coefficient was .79; the criterion was a rating 
by experts; the subjects were the same as 
for Test 1 . 

Reference: Same as for Test 1. 

T -scores: See page gg. 

Comments: This test can be administered in a very short 
time. If the players are in lines of three or 
four, and rotate in taking turns, the element 
of fatigue is prevented from affecting the 
scores. The test author suggests that because 
of the high reliability, the number of trials 
might be reduced to as few as two. This is 
very similar to the wall pass test in basket- 
ball, and substantiates the evidence there 
that very few trials are required. The high 
reliability is typical of tests involving time. 
Note the same tendency in softball “re- 
peated throws” and in volley ball “repeated 
volleys." 

Kick-Ups 

Equipment: 1. Soccer balls. 

2. Markings 

Three feet out from the side line, mark a 
line parallel to the sideline and 2 feet in 
length. Complete a square, with 2-foot sides, 
as shown in the diagram below. Locate a 
point which is on the imaginary extension of 



94 


Better Teaching Through Testing 

a diagonal line across the square and 4 feet 
from the nearest comer. Mark this spot and 
use it as a starting line. Arrange as many 
similar targets as needed for administering 
the test efficiently. 



T 



T 



Figure as. Field Markings for Speedball Kick Up Tests 

T — thrower, ready to put ball in play 
P — player in starting position ready to take test 


Test: The player being tested stands behind the starting line. 

The thrower takes a position behind the side- 
line and drops the ball in the 2-foot square. 
As soon as the thrower releases the ball, the 
player being tested runs forward to execute 
a kick-up. Five practices are given followed 
by ten trials. 



Measurement of Skill in Sports 95 

Scoring: The total number of successfully executed and 
caught kick-ups of the ten trials is the score. 
If the ball is not thrown so that it lands in 
the 2-foot square, the trial is repeated. 

Reliability : By the odd-even method, the coefficient was 
.87; corrected by the Spearman-Brown for- 
mula .93. The subjects were the same as de- 
scribed in Test 1. 

Validity: r = .85. 

T-scores: See page gg. 

Reference: Same as Test 1. 

Comments: This test has proven quite challenging to play- 
ers and it is a skill that is practiced a great 
deal by all players, regardless of their posi- 
tion. It includes the skills of judging the 
speed and height of the ball, the ability to 
execute the kick-up, and to catch and hold 
the ball. 

Dribbling and Passing 

Equipment: 1. Soccer balls and five objects (Indian clubs, 
jump standards). 

2. Markings 

Mark a starting line, 60 yards from one of 
the end lines of the field and parallel to it. 
Place one object 3 yards to the right of the 
right hand goal post (as you face the goal 
from the center of the field), and 10 yards 
from the end line. 

Place the other objects at 10 yard intervals 
in a line parallel to the sideline, so that one 
object is 10 yards from the starting line. 

If you wish to give the test to as many as four 
players at one time, use an arrangement simi- 
lar to that on the diagram, marking the goal 
space with 12-inch lines, perpendicular to 
the end line. The two set-ups on the right, as 
you face the goal line, will permit adminis- 



Better Teaching Through Testing 

tration of the test with the pass to left, while 
the two on the left are for the pass to the 
right. 



Figure aj. Field Markings and Plan for Administering Speedball Dribbling and 

Passing Tests 

x — obstacles 

Field is marked for two courses kicking from the right and two from the left 


Test: Ball is placed on the starting line, in line with the 
objects. On signal, the player starts dribbling 
forward, going to the right of object x, to 
the left of #2, and so on, and to the right 
of object #5, from which place he immedi- 
ately passes the ball to the left, attempting 
to send it between the goal posts (or indica- 
tions of goal posts). There are five trials; 
then five more trials from the other side, 
where it is necessary to pass to the right to 
send the ball through the goal area. To do 
this, the player must go to the right of the 



Measurement of Skill in Sports 97 

first object. This makes a total of ten trials. 

Scoring: The number of seconds to dribble 50 yards ten 
times, minus ten times the total number of 
accurate passes. 

Reliability: Odd-even r = .96, corrected by Spearman- 
Brown formula = .98. The subjects were 
the same as for Test 1. 

Validity: r = .69, with a criterion of ratings described in 
Test 1. 

Reference: Same as for Test 1. 

Comments: This test would be improved with a little more 
refinement. Also, a line 10 yards from the 
end line and parallel to it, passing under the 
fifth object, would increase the accuracy in 
timing. The addition of the requirement that 
the ball be passed on or before reaching the 
6-yard line would prevent players from going 
too near before making the pass. This test 
will take longer to admi lister than any of the 
others. It is possible that further study will 
show that the number of trials from each 
side could well be reduced to three, per- 
haps even two, and still be highly reliable. 

Passing 

Equipment: Same as for Test 4 (dribbling and passing). 

Test: Same as Test 4. 

Scoring: The number of accurate passes out of ten trials. 

Reliability: r = .84 by the odd-even method, .91 when cor- 
rected by the Spearman-Brown formula. 
Same subjects were used as for Test 1. 

Validity: r = .86 with the criterion of ratings. 

T -scores: See page gg. 

Reference: Same as for Test 1. 

Comments: This method of scoring results in higher va- 
lidity. Further study should be done to 
determine the effect of shortening the dis- 
tance required for dribbling before the pass 
is made. 



98 Better Teaching Through Testing 

6. Dribbling 

Equipment: Same as for Test 4 (dribbling and passing). 

Test: Same as Test 4. 

Scoring: The total number of seconds to dribble 50 yards 
10 times. 

Reliability: r = .96 for the odd-even method, .98 corrected 
by the Spearman-Brown formula. The sub- 
jects were the same as for Test 1 . 

Validity: r = .57. 

Reference: Same as for Test 1. 

Comments: Dribbling ability is apparently not closely related 
to general playing ability. This test is in- 
cluded here to illustrate how a test can be 
highly reliable and still not have satisfactory 
validity; also, how the elements in Test 4 
(dribbling and passing) were studied until 
one feels safe in concluding that passing the 
ball while moving is the important element 
of the test. 

BATTERY OF SPEEDBALL TESTS 

If time does not permit giving all the tests, the best 
two-item battery is throwing and catching while standing and 
the passing test. Buchanan found that this combination gave a 
high correlation (R = .93) with the criterion (general playing 
ability, judged by three teachers). This battery can be adminis- 
tered to a class of forty girls in forty minutes, provided that a set- 
up is arranged for the passing test similar to that shown in Figure 
22. The proper weighting for this two-item battery is 

1. throwing and catching -f- 3. passing 

That the two tests do not measure exactly the same skill is shown 
in the relatively low intercorrelation of .57. 



99 


Measurement of Skill in Sports 
TABLE VII 

T -Scales for Speedball Tests * 

THROWING AND THROWING AND 

LIFT TO OTHERS CATCHING STANDING CATCHING STANDING 

( i63 ) " (72)" ( 159 ) b 


10 

trials 

Mean of j 

trials 

Mean of 

a trials 

Raw Score 

T -score 

Raw score T-score 

Raw score 

T-score 

10 

7 1 

19.8-20.2 

75 

27.8-29-7 

72 

9 

63 

1 9 - 3 * 1 9-7 

72 

25.8-27.7 

65 

8 

57 

18.8-19.2 

69 

23.8-25.7 

57 

7 

53 

18.3-18.7 

67 

21.8-23.7 

5 * 

6 

47 

17.8-18.2 

65 

19.8-21.7 

48 

5 

43 

1 7 - 3 * 1 7-7 

62 

17.8-19.7 

45 

4 

39 

16.8-17.2 

60 

15.8-17.7 

41 

3 

34 

16.3-16.7 

58 

13-8-15.7 

37 

2 

30 

15.8-16.2 

56 

11.8-13.7 

33 

1 

*5 

1 5 - 3 ' 1 5-7 

54 

9.8-11.7 

26 



14.8-15.2 

5 i 





1 4 - 3 ‘ 1 4-7 

46 





13.8-14.2 

43 





• 3 - 3 -i 3-7 

4 * 





12.8-13.2 

4 i 





12.3-12.7 

40 





11.8-11.2 

39 





11.3-11.7 

S6 





KICK-UPS 


PASSING 


PASSING 


(a 62) * 


(190) * 


(72) b 


jo trials 


5 trials 


10 trials 


Raw Score T.-score 

Raw Score T-score 

Raw Score T-score 

10 

72 

5 

68 

10 

7 » 

9 

66 

4 

5 » 

9 

66 

8 

59 

3 

43 

8 

62 

7 

54 

2 

35 

7 

59 

6 

50 

1 

28 

6 

57 

5 

47 



5 

5 l 

4 

43 



4 

46 

3 

38 



3 

40 

2 

34 



2 

35 

1 

*9 



1 

25 

0 

21 






* The scales were constructed from data obtained in the Buchanan study on high school girls in Parsons, 
Kansas. 

b The number of subjects included In each scale is indicated at the top. 

The scale on throwing and catching standing (mean of 2 trials) illustrates especially well the effect of 
Making the step-intervals too lane, resulting in large gaps in the T-scale. The range in all the tests is small 
which accounts for the gaps and irregularities In each of the scales. 





lOO 


Better Teaching Through Testing 


TENNIS 

1. Wallboard Test 

Equipment: i. Backboard or wall, approximately ten feet 
in height and allowing fifteen to twenty feet 
in width per person taking the test at one 
time; stop watch; two balls and a racquet per 
player. Balls should be in good condition 
and racquet should be tightly strung. Box for 
extra balls, about 12 inches long, 9 inches 
wide, and 3 inches deep, placed on the floor 
where the restraining line (described below) 
joins the side (at the left for right handed 
players and right for left handed players). 
A racquet may be substituted for the box; 
the racquet is placed on the floor in the same 
position as that described for the box, and 
the balls are laid on the face of the racquet. 

2. Markings 

A line, 3 inches in width, should be drawn 
on the wall, to represent the net, so that the 
top is 3 feet from the floor. A restraining 
line, 5 feet from the base of the wall should 
be drawn on the floor, parallel to the wall. 
(See comments below for change in position 
of the restraining line.) 

Test: On the word Go of the signal, Ready, Gol drop the 
ball and let it hit the floor once and then 
start rallying it against the wall. Continue 
rallying until the signal to stop. The ball 
may bounce any number of times or it may 
be volleyed. At the start of the test and 
whenever a new ball is put in play, it must 
be allowed to bounce before being hit. Any 
stroke may be used but all strokes should be 
played from behind the restraining line. You 
may cross the line to retrieve the ball but 
hits made from this position are not scored. 



101 


Measurement of Skill in Sports 

If the ball gets out of control, you may take 
another ball from the box. (See p. 43 for 
a discussion of the demonstration for this 
test.) 

Scoring : Each time a ball strikes the wall on or above the 
net line, having been hit from behind the 
restraining line, one point is scored. Three 
trials are given, and the score is the sum. The 
length of each trial is 30 seconds. 

Reliability: Test appears highly reliable. 

Validity: Using the criteria of rankings by three experts 
and of round robin tournament rankings, 
the correlations range from .90 to .92, ac- 
cording to Dyer. 

Reference: Dyer, Joanna Thayer: “Revision of the Back- 
board Test of Tennis Ability,” Research 
Quarterly, 9, March, 1938, p. 25. (Quoted by 
permission of the Research Quarterly.) 

Comments: This test is satisfactory for a classification test. 

It should not be used to measure form in 
tennis. It can be administered easily and is 
completely objective. 

We recommend moving the restraining 
line to a distance of 20 or 25 feet from the 
wall (distance depends somewhat on the wall 
surface and the type of rebound it gives). See 
comments in Chapter 2, p. 15. 

2. Tests for Form 

See Chapter 7 for suggestions on rating tennis form. 

No objective tests are available. 

VOLLEY BALL 

1 . Repeated Volleys 

Equipment: 1 . Well inflated balls, unobstructed wall space 
10 feet long and 15 feet high (preferably sev- 
eral such areas), and a stop watch. 



102 Better Teaching Through Testing 

2. Markings 

A line 10 feet long marked on the wall at 
net height, 71^ feet from the floor. 

A line on the floor, opposite the wall mark- 
ing, 10 feet long and 3 feet from the wall. 

Test: The player being tested shall stand behind the 3-foot line, 
and shall toss the ball to the wall with an 
underhand movement. When it returns, she 
shall volley it repeatedly against the wall 
above the net line for 1 5 seconds. The ball 
may be set up as many times as desired or 
necessary; it may be caught and re-started 
with a toss as at the beginning. If the ball 
gets out of control, it must be recovered by 
the subject and brought back to the 3-foot 
line to be started over again as at the begin- 
ning. 

This procedure shall be repeated until ten 
trials have been given, each 15 seconds in 
length. 

Scoring: The score for one trial shall be the number of 
times the ball is clearly batted (not tossed) 
from behind the 3-foot line on the floor to 
the wall above or on the net line. The score 
for the test shall be the sum of the five best 
trials out of ten. 

Reliability: r=.y8, correlating by the odd-even method. It 
was not corrected by the Spearman-Brown 
formula since only five of the trials are used 
in the final score. Reliability computed on 
the five best trials on successive days should 
yield an equally high, perhaps higher, co- 
efficient. The subjects were 47 senior high 
school girls. University High School, Iowa 
City, Iowa. 

r=.g6 computed by the odd-even method; 
subjects were 75 college women, University 
of Minnesota. Ten trials; all used. 



Measurement of Skill in Sports 103 

Validity: r=.qa when correlated with a criterion of sub- 
jective ratings, made by four experienced 
teachers of volley ball. 

T -scores: See page 105. 

Reference: French, Esther and Cooper, Bernice: Achievement 
Tests in Volley Ball for High School Girls, 
Research Quarterly , 8, May, 1937, p. 150. 
(Quoted by permission of the Research 
Quarterly.) 

Comments: This test measures the player’s ability to control 
the ball, and his judgment on position and 
playing the ball. It can be used for classifica- 
tion purposes early in the term and then 
again later to measure achievement. It can 
be administered economically with a large 
number of players taking the test at the same 
time, if wall space and balls permit. 

Serve 

Equipment: 1. Regulation court and tightly strung net, 
well inflated balls. 

2. Markings 

(a) A line across the court 5 feet inside and 
parallel to the end line. 

(b) A line across the court parallel to the net 
and 1 2 1/2 feet from the center line which 
is directly under the net. 

(c) Two lines each 5 feet inside the court 
and parallel to the side lines, extending 
from the line under the net to the 5- 
foot line described in (a). 

(d) The score values of each area should be 
marked on the floor as indicated in the 
diagram (Figure 24). 

Test: The player being tested stands in the proper serving 
area on the court opposite the target and is 
given ten trials to serve the ball into the tar- 
get in the court across the net. Any legal 
serve is permitted. Foot faults shall count as 



104 Better Teaching Through Testing 

I 



1 

Figure 24. Floor Markings for Volleyball Serve Test 
1 — 5-score for respective areas 

trials; “let” serves shall be reserved and do 
not count as trials. The scorer stands on a 
chair near one sideline about fifteeen feet 
from the net. 

Scoring: The score values are indicated on the diagram. A 
ball landing on a line separating the two 
spaces scores the higher value. A ball land- 
ing on an outside boundary line scores the 
value of the area the line bounds. Trials in 
which foot faults occur score zero. Twenty 
trials should be allowed. 

Reliability: r = .68 by the odd-even method, stepped up to 
.81 by the Spearman- Brown formula. The 
subjects were the same as for Test 1. (Uni- 
versity High School, Iowa City.) 

Validity: r = .63, with a criterion of ratings (same as Test 1). 

T-scores: See page 105. 

Reference: Same as for Test 1. 

Comments: This test is time consuming, since it requires 
so many trials. When the target is painted on 
both sides of the court, two players can be 
tested at the same time on each court. The 








Measurement of Skill in Sports 105 

use of ball chasers to keep the servers sup- 
plied with balls facilitates the testing. It is 
an excellent teaching device for practice of 
placement in serving. 

BATTERY OF VOLLEY BALL TESTS 

The best single test is the repeated volleys. The two 
tests together yielded a correlation coefficient with the criterion (raw- 
ing of playing ability) of .81 in the study cited here. This combina- 
tion can be administered at the same time, if floor space permits. 
The low intercorrelation of .39 between the two tests indicates that 
they measure different things. The formula for combining the two 
tests, to insure giving proper weight to each, is as follows: 

1. repeated volleys 4- g. serve 


TABLE VIII 

T-Scales for Volley Ball Tests 


T- 

Score 

Repeated, * 
Volleys 
(349) * 

Repeated * 
Volleys 
(lao) 1 

Repeated 
Volleys 
(*3 &) ' 

Serve * 
( 549 )' 

Serve * 

w 

T- 

Score 

80 

130-139 



48 


80 

79 






79 

78 

186-1 sg 



47 

44 

78 

77 



40 



77 

76 


*5 


46 


76 

75 

1*4-1 *5 



45 


75 

74 

188 - 1*3 



44 


74 

73 

120-1 SI 


38 

43 

40 

73 

7 * 

118-ug 

*4 


4 * 


78 

7 * 

116-117 

*3 

37 


35 

7 » 

70 


88 

35 

4 i 

34 

70 

69 

114-115 

81 

34 

40 


69 

68 

HO-113 


S 3 

39 

33 

68 

67 

108-109 


3 * 

38 


67 

66 

106-107 

80 


37 

3 * 

66 

6 5 

104-105 

19 

Si 

36 

3 * 

65 

64 

108-103 

18 

30 

34 

30 

64 

63 

100-101 

‘7 

*9 

S 3 


63 

6s 

98-99 


*7 

3 * 

*9 

6* 

6l 

98-97 


86 

3 ° 

88 

61 

60 

94-95 

16 

*4 

*8 

*7 

60 

59 

9*-93 


88 

*7 

*6 

59 

58 


*5 

SI 

*6 

*5 

58 

57 

90-91 


80 

*5 

*4 

57 

56 

88-89 

»4 

>9 

*4 

*3 

56 








io6 


Better Teaching Through Testing 

TABLE VIII (Continued) 
T-Scales for Volley Ball Tests 


r- 

Score 

Repeated * 
Volleys 
(349) * 

Repeated h 
Volleys 
(zao)* 

Repeated ° 
Volleys 
('*) * 

Serve 4 
(349 ) ' 

Serve * 

(^2) * 

T- 

Score 

55 

86-87 


18 

23 

22 

55 

54 


13 


22 

21 

54 

53 

84-85 


17 

21 

20 

53 

5* 

82-83 

12 


80 

*9 

5« 

5 1 

78-81 


l6 

*9 

18 

5* 

5° 

76-77 


15 

17 

*4 

50 

49 


11 


l6 

13 

49 

48 

74-75 


14 

15 

12 

48 

47 

72-73 



14 

11 

47 

46 

70-71 

IO 


>3 

10 

46 

45 

68-69 


13 

12 

9 

45 

44 

64-67 



10 


44 

43 

62-63 

9 

12 

9 

8 

43 

4* 

58-61 




7 

42 


56-57 

8 

11 

8 


4* 

40 

52-55 


10 

7 


40 

39 

48-51 



6 

6 

39 

38 

44-47 

7 

9 

5 

5 

38 

37 

42-43 





37 

38 

40-41 



4 

4 

36 

35 

38-39 



3 


35 

34 

36-37 

6 

8 


3 

34 

33 

34-35 



2 


33 

32 

3*'33 


7 

1 


32 

3* 

3°-3 l 

5 



2 

3 i 

3° 

28-29 





So 

29 

24-27 

4 




*9 

28 






28 

27 

22-23 


6 



27 

26 

80-21 





26 

*5 

18-19 





*5 

*4 

16-17 

3 




*4 

23 






23 

22 






22 

21 

14-15 





21 


•Sale constructs from data obtained on girls is Blue Island Community High School, Blue Island, 1111- 
ivols, and m East Aurora High School, Aurora, Illinois. Score is the sum of the beat five out of ten trials 
" Scale constructed on data obtained from girls in Muscatine High School, Muscatine, Iowa: score is the 
best single score out of three trials. 

o&ale constructed on data obtained on University of Iowa Btudents. Score is the best of three trials. 
d Same group as in a; score is total of io trials. 

J Same group as in c; score is total of io trials. 
f Indicates the number of cases In the distribution. 



Measurement of Skill in Sports 
BIBLIOGRAPHY 


107 


1. Bassett, Gladys, Glassow, Ruth, and Locke, Mabel: “Studies in Testing Volley 
Ball Skills,” Research Quarterly, 8, December 1937, p. 60 

g. Bontz, Jean: An Experiment in the Construction of a Test for Measuring 
Ability in Some of the Fundamental Skills Used by Fifth and Sixth Grade 
Children in Soccer. Unpublished M.A. thesis, University of Iowa, 194a 

3. Brace, David K.: “Validity of Football Achievement Tests as Measures of Motor 
Learning and as a Partial Basis for the Selection of Players,” Research Quar- 
terly, 14, December 1943, p. 37a 

4. Brechler, Paul W.: A Test to Determine Potential Ability in Football ( Backs 
and Ends). Unpublished M.A. thesis, University of Iowa, 1940 

5. Buchanan, Ruth E.: A Study of Achievement Tests in Speedball for High 
School Girls. Unpublished M.A. thesis, University of Iowa, 1942 

6. Colvin, Valerie, Glassow, Ruth B., and Schwartz, Marguerite: "Studies in 
Measuring Basketball Playing Ability of College Women,” Research Quarterly, 
10, October 1938, p. 60 

7. Cormack, Herbert P.: A Test to Determine Potential Ability in Football {Line- 
men). Unpublished M.A. thesis, University of Iowa, 1940 

8. Cozens, Frederick, Cubberley, Hazel, and Neilson, N. P.: Achievement Scales 
in Physical Education Activities for Secondary School Girls and College Women. 
A. S. Barnes & Company, New York, 1937 

9. Cozens, Frederick, Trieb, Martin H., and Neilson, N. P.: Physical Education 
Achievement Scales for Boys in Secondary Schools. A. S. Barnes 8c Company, 
New York, 1936 

10. Dyer, Joanna Thayer: “Revision of the Backboard Test of Tennis Ability,” 
Research Quarterly, 9, March 1938, p. 25 

11. Dyer, Joanna Thayer, Schurig, Jennie C., and Apgar, Sara L.: “A Basketball 
Motor Ability Test for College Women and Secondary School Girls,” Research 
Quarterly, to, October 1939, p. 128 

12. Edgren, H. D.: “An Experiment in the Testing of Ability and Progress in 
Basketball,” Research Quarterly, 3, March 1932, p. 159 

13. French, Esther, and Cooper, Bernice: "Achievement Tests in Volley Ball for 
High School Girls,” Research Quarterly, 8, May 1937. p. 150 

14. Glassow, Ruth B., and Broer, Marion R.: Measuring Achievement in Physical 
Education. W. B. Saunders Company, Philadelphia, 1938 

15. Hewitt, Jack E.t “Achievement Scale Scores for Wartime Swimming,” Research 
Quarterly, 14, December 1943, p. 391 

16. Howland, Amy R.: National Achievement Standards for Girls, National 
Recreation Association, New York, 1936 

17. Hyde, Edith I.: "An Achievement Scale in Archery,” Research Quarterly, 8 , 
May 1937, p. 109 

18. Johnson, E. L.: Objective Tests in Basketball for Boys. Unpublished M.A. 
thesis, University of Iowa, 1934 

19. Jones, Edith: A Study of Knowledge and Playing Ability in Basketball for 
High School Girls. Unpublished M.A. thesis, University of Iowa, 1941 

so. Mitchell, A. Viola: “A Scoring Table for College Women in the Fifty-yard 
Dash, the Running Broad Jump, and the Basketball Throw,” Research Quar- 
terly, 5, Supplement March 1934, p. 86 

21. Neilson, N. P., and Cozens, Frederick: Achievement Scales in Physical Educa- 
tion Activities for Boys and Girls in Elementary and Junior High School. 
A. S. Barnes Sc Company, New York, 1934 



108 Better Teaching Through Testing 

s». Schaufele, Evelyn F.: The Establishment of Objective Tests for Girb of the 
Ninth and Tenth Grades to Determine Soccer Ability. Unpublished M.A. diesis, 
University of Iowa, 1940 

sg. Schmithals, Margaret, and French, Esther: “Achievement Tests in Field Hockey 
for College Women,” Research Quarterly, 11, October 1940, p. 84 

84. Scott, M. Gladys: "Achievement Examinations in Badminton,” Research Quar- 
terly, is, May 1941, p. 34s 

35. : "Assessment of Motor Ability of College Women through Objective 

Tests," Research Quarterly, 10, October 1939, p. 63. 

s6. Thomas, Jesselene: "Skill Tests,” Official Softball-Volley Ball Guide. A. S. 
Barnes & Company, New York, 1943, p. si. 

if. Underkofler, Audrey: A Study of Skill Tests for Evaluating the Ability of 
Junior High School Girls in Softball. Unpublished M.A. thesis. University of 
Iowa, 194s 

s8. Young, Genevieve, and Moser, Helen: "A Short Battery of Tests to Measure 
Playing Ability in Women’s Basketball,” Research Quarterly, 5, May 1934, p. 3 



5 * 


Evaluation of Physical Fitness 


THE NEED FOR FITNESS TESTS 

As a direct result of the war and war-time emphases, 
teachers have been forced to consider the outcome of their programs 
more carefully than they have ever done before. The requirements 
for waging total war have placed more and more of a premium on 
physical fitness. Physical fitness rose high in discussion of physical 
education objectives. Objectives long held were consolidated under 
a single composite term, physical fitness or total fitness. Immedi- 
ately teachers, pupils and laymen raised their respective questions, 
“Am I achieving fitness in students?” “How fit am I?” or “Am I 
improving?” and “Is physical education really contributing to fit- 
ness?” Each question was the result of varying degrees of interest, 
curiosity and doubt. Each question deserved an answer. 

Both student and layman expected a specific answer. We talked 
about fitness as an entity, therefore, it seemed justifiable to expect 
a single answer. They have become conditioned to high-power test- 
ing procedures with a composite score. Our I.Q. is expressed in a 
single figure; our motor quotient corresponds; the state-wide every 
pupil test gives the pupil or school a percentile ranking; the army 
gives an intelligence score; the popular magazines give the reader 
a self-administering test with a score on his powers of observation, 
or wit, or what have you. In such cases, the qualities measured are 
often more intangible than that of fitness and the scores are pre- 
sented as a valid measure of that quality even in cases not entirely 
justified. 

A survey of the history of testing reveals the development of 
highly specific forms of measuring devices. Tests are designed to 
measure motor ability, innate capacity or educability, size, strength, 
or a specific sport skill. The ones claiming to measure physical fit- 
ness were of two types. The first type was based on the physiological 

iog 



no Better Teaching Through Testing 

work principles and consisted principally of measures of cardio- 
vascular or respiratory functioning. Most of them involved highly 
complex apparatus, required administration as an individual test 
although they showed such low reliability that they seemed better 
adapted to group than to individual interpretation. The second 
type purporting to measure fitness included strength tests in their 
various forms. In spite of the claims for strength tests as indicators 
of health, vigor and fitness, most teachers considered these claims 
as exaggerated and considered them as another specific, a strength 
measure. 

So the answer to the question of how to measure fitness did not 
seem to be forthcoming. It was apparently another case of “water, 
water everywhere and not a drop to drink,” i.e., tests of all kinds 
but none suitable for this particular need. 

Then as thinking on fitness became more crystallized and situa- 
tions arose which illustrated the need for different kinds of fitness, 
these specific tests seemed to be more pertinent. The first reaction 
to that realization was the general policy of accepting one or two 
tests and admitting that they were inadequate but all that was avail- 
able. The result was as many different systems as there were teach- 
ers. Evidence of this situation is to be found in the manual published 
by the United States Office of Education 21 early in 1943. With 
reference to the girls’ program the manual says that there is a lack 
of scientific evidence available on tests for girls, and that standards 
presented represent the best available and may serve as guides to 
teachers and as incentives to pupils. The measures then suggested 
are jump and reach, a potato race, soccer throw-in, and a free style 
swimming event. 

Even with the acceptance on an empirical basis of the available 
tests the problem was not solved. To be useful any test must have 
some standard, some basis for comparing the individual or the class. 
The same procedure has been followed again and again. For ex- 
ample, the Minnesota State Civilian Defense Council 18 quickly 
set up suggested standards for both sexes, from age 12 up to 35 
years and up, in order to get the state-wide Physical Fitness Club 
going. A similar purpose was behind the writing of How to Be Fit 
and Like It. 80 With data from some cases and committee judgment 

*°> ai See reference in bibliography at the end of this chapter. 



Evaluation of Physical Fitness m 

on what was feasible, suggestions for norms were made to be con- 
sidered useful on a national scale. 

It was with this background and these needs that the teachers set 
about solving this problem. By the summer of 1943 when the United 
States Office of Education published their second manual, one on 
physical fitness for college students, sa they had reached the first stage 
in more satisfactory testing. That stage was the analysis of physical 
fitness into its component parts and selection of tests for these spe- 
cific qualities. This was followed almost immediately by the pro- 
ceedings of the physical fitness workshop which was sponsored 
by the National Association of Physical Education for College 
Women. 1 ' 

The approach in the two cases was similar. It seems to be the 
only logical one. Intelligence tests have been broken into various 
parts, e.g., number, verbal and deductive factors. This is exactly the 
same procedure. The two sources mentioned above, the college 
manual and the workshop proceedings, list the following elements 
in common: Strength, endurance, agility or body control, flexi- 
bility, and posture. The latter bulletin includes also kinesthesia, 
power, and speed. 

The Women’s Army Corps conditioning program 24 follows a 
similar analysis. Considerable emphasis is put on the development 
of strength of different muscle groups with a suggested test for each. 
The other items are posture, agility, balance, coordination, speed, 
relaxation and specific skills. 

The programs of fitness testing developed in the schools show 
certain characteristics and trends. Much more work has been done 
on tests for the college level than for the high school level. Fitness 
tests have been used more at the college level than at the high school 
level. There probably are several reasons for this. It is partially a 
reflection of the fact that more suggestions have been made for the 
older group. The apparent immediacy for the college student of 
military or civilian work demanding a high level of fitness has re- 
sulted in greater emphasis on development of fitness in the college 
programs. 

One of the outstanding points to be noted in studying testing 
programs is the great variation of test items. Some schools give a 
very limited battery, others a very extensive battery; but brief or 



118 Better Teaching Through Testing 

extensive there is very little exact duplication from school to school. 
The similarity lies in testing for specific phases, usually of "motor 
fitness." The batteries are as varied as the schools. For example, 
the Research Committees of the Central and Midwest Districts of 
the National Association of Physical Education for College Women 
conducted surveys in 1944 to determine the fitness tests in use in 
the various schools of those two districts. The results showed ap- 
proximately fifty different tests in use for the evaluation of fitness. 
There seemed to be more similarity in tests used by various schools 
within each district than between the two districts. 

An example of a testing battery adjusted for the high school girl 
is a study made in Illinois. 1 * This study sets up minimum standards 
and achievement scales for twelve different items. The National 
Section on Women’s Athletics of the American Association for 
Health, Physical Education and Recreation also suggested standards 
of performance on a number of tests. 1 * 

The college men’s plans for testing fitness show almost as much 
variation as those for women. There is perhaps a little more influ- 
ence from testing programs in the armed services and for that 
reason more agreement on the basic items. An example of the men’s 
battery in wide use is that developed in Indiana. 1 In general, the 
men have attempted by means of tests to evaluate the results of 
the program more than has been done for the women. Examples 
of this work appear in the bibliography. 1 *- ** 

SELECTION OF FITNESS TESTS 

When tests are selected for specific qualities, then a 
few or many may be chosen according to the kind or level of fitness 
for which one is striving, the time available for testing, and the 
merits of the various tests. The first point must be decided in each 
particular case when one knows what duties the students are or will 
be assuming. The other two points will be discussed. 

Economy of time in test administration has already been dis- 
cussed with respect to general programs. The possibilities for 
economizing in this type of testing are probably greater than in 
many others. The first advantage is that many tests can be given in 
mass. Mass tests are usually limited only by the number who can 



Evaluation of Physical Fitness n$ 

be put into the testing area. Partners are instructed on the method 
of scoring and the test then requires just long enough to give it to 
two persons. Tests are usually limited because they require equip- 
ment or special areas for their performance. For example, arm 
strength may be measured by pull-ups or chinning in various forms, 
by push-ups in several forms, or by push-pull tests with the dy- 
namometer. All forms of hanging and pull-ups require equipment, 
and that equipment is of such a nature that it is usually very 
limited. The dynamometers are expensive and often unobtainable. 
That leaves the push-up test which can be given at different levels 
of difficulty without equipment. 

By avoiding use of two or more tests which measure the same 
thing, time can be saved. This can be determined by a subjective 
analysis of what the tests involve, or preferably by statistical analysis 
if such is available. For example, Wilson’s 28 study on strength tests 
for college women indicated certain relationships in terms of cor- 
relation coefficients. 


TABLE IX 

Inter-correlation of Strength Tests in the Wilson Study • 


Related too highly to 

give both 

tests 

Not highly related 


» * 3 

4567 

1 

* 3 4 

1. Bent arm hang 



1. Bent arm hang 


s. Push-up from knees 

•43 


s. Weight holding .34 


3. Push-up on bench 

•54 -76 


3. Push-pull .85 

.21 

4. Pull-up 

.56 .68 .63 


4. Vertical pull .S3 

*3 

5. Pull-up (knees bent) 

•54 48 .53 -58 

5. Rope climb 

.is .87 -s6 

6. Rope climb (arms only) 

.5s .61 .6s 

,6s 46 

6. Push-up (bench) 

.37 .to .10 

7. Push-pull 



7. Pull-up 

•*7 -*7 -35 

8. Vertical pull 


Mi 

8. Pull-up (knees 

•3° 




bent) 



* Adapted from Table I, based on 51 subjects. 


The Wilson study started with tests which were believed to 
measure similar capacity. Statistical evidence bears out similarity 
in some, differences in others. Mohr’s 14 study started with tests 
which were believed to be different as judged subjectively. The sta- 
tistical evidence corroborates that selection. 







ii4 Better Teaching Through Testing 

TABLE X 

Inter-correlation of Tests in the Mohr Study * 


1*345 


1. 

Sit-ups 





s. 

Chair stepping 

.101 




3- 

Push-up 

•494 

■043 



4- 

Bouncing 

.361 

.218 

•3*9 


5- 

Puli 

•**9 

.101 

.042 

.207 

6 . 

Obstacle race 

.411 

.077 

•349 

•34 1 


* From Table I, based on 140 cases. 

The scoring method may affect the time required. Most of these 
tests allow only one or two trials so there is no problem of many 
repetitions. However, a test may be set up with a specified time 
limit and scored on the number of repetitions performed in that 
interval, or it may be scored as the time required to perform a 
specified number of movements. The uniform time is better as it 
makes it possible to administer to the whole group with a single 
timer and stop watch, to know the exact time necessary, and to 
avoid having the faster ones wait while the slow ones perform. If 
the duration of the test is comparatively short in both cases, the 
capacity measured is essentially the same, that for maintaining a 
maximum rate for a reasonable length of time. 

Tests should grow directly out of regular instruction. In that case 
a minimum of specific instruction and special practice will be neces- 
sary when testing is conducted. Then with the careful organization 
of partners or assistants, and a scoring and recording system, little 
time need be spent in testing. Building of fitness requires time; 
most of us have too little time to accomplish what we wish. Let us 
not waste unnecessary amounts on testing. 

Additional time will be saved if the tests are self-administering 
or can be given to each other. This means that the scoring must 
be simplified and the performance so obvious that no errors in 
scoring can be made after brief instructions. When each person 
must be judged by the teacher, or equipment must be adjusted for 
each person by assistants, too much time is spent in waiting. Fur- 
thermore, the tests must be given to all in sequence rather than 
simultaneously. 



Evaluation of Physical Fitness 115 

Tests which require equipment of any type create a problem. 
Strength testing equipment is not on the market. Where spring 
balance scales can be obtained they may be used for substitute tests 
very successfully. They have been found to be about as valid and 
reliable as the dynamometer tests.* If equipment seems necessary, 
use that which is already in the gymnasium. Obstacle courses have 
been used quite extensively as a means of measuring endurance 
and general agility. Most of these require considerable space and 
labor to set them up. Shorter courses may be arranged in the gym- 
nasium with equipment which is there. In any case in which the 
obstacle course is used, instruction should be given on each part 
of it and practice permitted before it is done under pressure of 
time. This will do much to avoid injuries. The obstacle courses in 
schools have frequently been built for the boys and then used by 
the girls. This invites injuries because the distance to be jumped 
or the heights to be scaled are usually too great for the majority 
of the girls. The boys’ obstacles are seldom padded sufficiently to 
give adequate protection for the girls. This problem may be par- 
tially solved by providing different obstacle? along the same course 
for the girls. 

Another requirement of good tests is that there be no zero scores, 
or no undue massing of scores at any point along the scale. This is 
sometimes very difficult to avoid in this type of testing because 
of the great range in ability and because the standards we consider 
desirable are apt to be far above the ability of many of the group. 
One might argue that the tests should not be modified for the 
weaker group because it gives them a false impression of their status. 
However, the purpose of the whole testing procedure, motivation 
is defeated if some adjustment is not made. If a student and all 
his friends in the class get a zero score he is much more impressed 
by that fact than he is that a few persons made a very high score. 
Tests may be modified to suit this large group of less capable ones 
and then increased in difficulty as they improve. In other words 

* Wilson, Marjorie, ibid. 

Reliability— grips, pull, push, .94, .89, .76 respectively; spring scale: thrust, 
horizontal pull, vertical pull, .91, .8*, .91 respectively. Validated against 
Rogers short strength index, push and pull, .49; vertical pull, .59 

Mohr, Dorothy, ibid. 

Reliability— vertical pull, .93 



n6 Better Teaching Through Testing 

the tests should be adjusted so as to be a challenge, rather than a 
source of discouragement and criticism. If this rule is followed it 
will eliminate some forms of the push-up, of the sit-up, or climbing 
for most classes. 

In order to obtain this progression in difficulty, and to select tests 
wisely on the basis of abilities involved, a careful analysis of each 
test must be made. For example, a sit-up is much more difficult if 
the legs are not held down, and much easier if the head and arms 
may be allowed to lead. Likewise, a running test may measure 
speed if set up to get maximum performance for a very short time, 
or it may measure endurance if planned for a longer period at 
maximum or submaximal speed. Another running test may measure 
leg strength, and in slightly different form, agility. If tests are to be 
used in a series this analysis of the test is essential for proper se- 
quence of tests. 

If tests are to measure ability accurately, they must result in little 
or no muscular soreness afterward and a limited amount of discom- 
fort while doing them. Motivation is the controlling factor here 
which will encourage effort in spite of consequences, but effort can 
not be forced, especially against such odds. College women are much 
more apt to consider such discomfort unnecessary, and the person 
who works to that point a “sucker,” than will high school girls. 
Hence, this point is much more important in working with college 
women. The student may work to the limit the first time but she 
remembers her previous experience and will not really try the 
second time. There are certain tests which students associate with 
immediate or later discomfort and hence they develop a distinct 
dislike for the test, a dislike which is apt to become apparent 
throughout the group. Soreness can be avoided to a considerable 
extent by aiming the conditioning program directly toward the 
tests. The WAC program has followed this principle very well in 
the organization of their physical training system, and in practice 
it has apparently succeeded in this respect. However, their tests are 
performed for a certain number of times, usually limited to a 
rather conservative number of repetitions or limited time interval. 
That is an additional factor conductive to better results. The physi- 
ological effects on a muscle worked to the limit can not be totally 
avoided even in a trained muscle. Fatigue products accumulate and 



Evaluation of Physical Fitness n<7 

the muscle works under ever less favorable circumstances as the 
muscle approaches the contracture of fatigue. The protective re- 
action of the organism tends to make the person decrease effort 
when it becomes too painful; it therefore takes a super type of moti- 
vation to get continued effort on endurance tests after discomfort 
sets in. Not only is submaximal effort obtained for practically all, 
but they quit at very different points along the fatigue curve. This 
makes the measure still more unreliable. 

Safety should be considered as carefully as any quality in selecting 
tests. As in any activity taught there may be an occasional injury, 
but most of these can be prevented by forethought. Certain tests 
such as sit-ups or squat thrusts, are conducive to strains in the ab- 
dominal muscles, or various ill effects on viscera by increased 
abdominal pressure. These certainly should not be engaged in by 
anyone subject to conditions such as hernia, appendicitis, or severe 
dysmennorhea. Likewise, the squat thrusts, deep knee bends, and 
many forms of jumping are hard on knees, particularly if the knee 
is already weak, or if the person is extremely overweight. 

Some tests of endurance involve a run of 150 to 200 yards or 
more. Such runs should not be taken unless the individual has 
been trained for such strenuous activity. Standards of competition 
for women and girls have always contra-indicated such events. Tests 
are essentially competition, competition against standards, or the 
rest of the class, or one’s previous record. It would seem that similar 
criteria concerning strenuousness should be applied to the so-called 
competitive events and to test events. The minimum requirement 
certainly should be adequate training. 

Adequate training to insure safety in the test suggests another 
practice which should be avoided. Tests are frequently given very 
early in a season or semester. The best means of showing improve- 
ment is to test at the beginning and again at the end of the learning 
interval. Care should be taken, particularly on endurance events, 
that they are not given without reasonable pre-conditioning. 

If obstacle courses are to be used for junior and senior high 
school ages, provisions should be made for adjustable or alternate 
obstacles for the very small child. Obstacles which are to be climbed 
or which the person must drag himself over are not very desirable 



n8 Better Teaching Through Testing 

for girls from the safety standpoint, and if used should be well 
padded. 

Before a teacher starts a program of testing for fitness it seems 
important that certain questions be considered. The program is 
largely determined by the answers. 

(1) What aspects of fitness am I teaching? 

(2) Is that kind of fitness measurable, and if so in what units? 

(3) Shall I attempt to measure comprehensively or just one or 
two aspects? 

The answers to the above questions would doubtless point to- 
ward the use of comparatively few tests of the qualities of total 
fitness which are best achieved through physical education expe- 
riences. The questioning then continues. 

(4) Shall I rely only upon highly refined measures or something 
which in spite of technical shortcomings will motivate interest and 
effort? 

(5) What measures are available which will have the least tech- 
nical shortcomings? 

The decisions on these two questions may leave one only a very 
limited choice or may give one opportunity for experimentation 
and ingenuity. Both courses have advantages. The final questions 
may be of this type. 

(6) Is time of training sufficient to produce results which are 
measurable? 

(7) Is time which is available sufficient to justify more than very 
brief testing? 

There is similarity between the fitness tests in use for the two 
sexes. However, some variations are essential. The tests suggested 
in this chapter are primarily for the girls and women. A few of the 
tests may be adapted for boys, or other tests may be selected in 
keeping with the considerations discussed above. 

In most of the classifications below, two or three tests are sug- 
gested. Seldom, if ever, would that many tests for a given purpose 
be used for a single class. Careful selection should be made from 
these suggestions, or from other tests, according to the ability of 
the class, the equipment available, and the requirements for good 
testing suggested above. These tests have been found valuable but 
others may be better adapted to certain situations. 



Evaluation of Physical Fitness 119 

In choosing a battery of tests some consideration should be given 
to their cumulative effect on the students, especially if they must be 
given in a single class period. When a series of extremely strenuous 
tests is given with the tests following in rapid succession, there may 
be fatigue to the point where the scores are reduced, and the stu- 
dents are excessively fatigued. Alternation of types of tests during 
administration will be helpful, but if a very extensive battery is to 
be given it is usually preferable to give by halves on successive days. 

TESTS OF STRENGTH 

Tests for Arm and Shoulder Girdle 
1. push-up (on knees) 

Description: Lie in the prone position with hands under the 
point of the shoulders, elbows spread a little (a in Figure 25). Feet 
may be raised from the floor or not as preferred. Keep body straight 
and extend arms fully; weight will be resting on hands and knees 
(see b). Bend arms so that chest again touches the floor. Repeat 



a 

Figure 35. Push-up from Knees 
a — top view b — side view 


promptly and continue as long as possible or for a stated number 
of times. Score is the number which can be done before stopping or 
before position is changed. 

Suggestions: If the hips and lower back are allowed to sag at all 



180 Better Teaching Through Testing 

the weight is transferred progressively up the thigh, hips, and trunk 
during body lowering, and in reverse order in the lift. It is necessary 
to keep the back straight and to hold the hips in very slight flexion 
(not exceeding 5 0 ) in order to keep the weight on the arms. 

If the performer keeps up regular breathing it will be more com- 
fortable and tend to give greater endurance. 

The knees should be protected by placing performers on mats, 
or if the floor is not too slippery by folding a sweat shirt or towel 
under the knees. 

Train students in proper form and in judging form before giving 
the test. Then give as a mass test to half the class with partners 
scoring. 

2. PULL-UP (on HORIZONTAL BAR) 

Adjust the bar to the level of the xiphoid (angle between the ribs 
at the base of the sternum) when the subject stands erect. 



Description: Grasp the bar with the hands about shoulder width 
apart and palms toward the face. Move the feet far enough beyond 
the bar that when the weight rests on the heels, with knees, hips 
and back straight, the line of the body forms a right angle with the 
line of the straight arms. (See Figure 26.) Keep the body straight 
and bend the arms until the neck or upper chest touches the bar. 
Extend arms again. Repeat again without pausing and continue as 
long as possible or for a stated number of times. Score is the number 
which can be done before stopping or before the body begins to 
sag or sway. 



Evaluation of Physical Fitness 1*1 

Suggestions: The sides of parallel bars may be used if a horizontal 
bar is not available. This does not permit quite as exact adjustment 
for height. Flying rings may also be used but give less uniform 
results because they swing. 

Usually equipment is very limited so only a few need be trained 
to administer the test. 

A similar form may be done with a partner who holds a wand 
and stands astride the performer who starts lying on her back. The 
performer grasps the wand and pulls up until the chest touches 
the bar, lowers to full arm length, and repeats. The top person 
must stand straight and firm, and with feet far enough apart to 
permit the performer to reach the wand. Sometimes the top per- 
son tires before the one who is taking the test. For that reason it 
seems better to have two persons hold the wand, one at each end. 
One of the advantages of these forms is that there is no need for 
adjustment of the bar. If no special equipment is available, small 
size softball bats may be used. 

Another form of this test puts the feet flat on the floor, knees are 
bent at right angles, trunk and thighs are horizontal, and arms are 
vertical. When the chinning is done the movement is at the knees, 
not at the ankles. A little practice is necessary to standardize form. 

3. PUSH AND PULL 

Use the hand dynamometer with push-pull attachment. 

Description: Hold the apparatus in front of the chest, one hand 
on each handle; elbows are bent and arms in a horizontal plane. 
Pull as hard as possible. This pull is similar to that exercise in 
which one tries to touch elbows behind the shoulders. (Assistant 
records score and resets dynamometer.) In same position push in 
on apparatus as hard as possible; the heel of the hand may be used. 
Do not brace the apparatus against the chest in either trial. 

Suggestions: If the handles have sharp edges they should be 
covered or padded with adhesive tape or something that will not 
slip. 

4. VERTICAL PULL (WITH SPRING SCALE) “ 

Use a good grade spring scale. The lower end of the scale is 
fastened to the floor. A rope, securely fastened to the upper end 



188 Better Teaching Through Testing 

of the scale, is run through an overhead pulley so that the handle 
end can be reached from a standing position, with the arm slightly 
flexed. The length of the rope should be adjusted to the height 
of the shortest girl. The rope may be shortened for taller girls by 
slipping a wooden peg through one or more loops of the rope. 

Description: Stand erect, in a comfortable stance and with shoul- 
ders fixed, pull down as hard as possible without bending the knees 
or hips, or twisting the body. Score is the number of pounds reg- 
istered on the scale. 

Suggestions: The examiner must squat to the level of the scale 
in order to read it accurately. 

Watch to see that the subject does not bend knees or trunk, then 
quickly read the scale while she continues pulling. 

Take the girls according to height in order to save time in ad- 
justing the rope. Give instructions to all rather than individually 
as each starts the test. 

TESTS OF ABDOMINAL STRENGTH 

1. SIT-UP 

Description: Lie on the back with knees bent, partner holding 
the feet down firmly. Fold the arms and hold against the chest. Rise 
to an erect sitting position, return to back lying. Repeat as many 
times as possible or for a stated number of times. Score is the num- 
ber of times the erect position is assumed. 

Suggestions: It is permissible to lead a little with the head, and 
this makes it unnecessary to judge for fouls. The elbows are kept 
down to prevent the momentum of an arm swing. 

Variations may be used; advantages and disadvantages follow: 

(1) hands behind neck, elbows kept back— encourages forward 
head and permits excessive arm pull 

(2) hands on the shoulders, elbows against the ribs— reduces 
arm momentum, but puts the weight of the arms forward and there- 
fore makes the test easier 

(3) arms extended along sides and hands kept on the floor— 
avoids a high arm lead, but may permit a push-off with the hands 
or elbows from the floor 

(4) arms extended, hands sliding along on top of thighs— difficult 



Evaluation of Physical Fitness 123 

to prevent the subject from pulling up by grasping the thighs or 
clothing 

(5) hands on shoulders, legs straight, bend forward far enough 
to touch the elbow to the opposite knee, alternating right and left 
—uses oblique abdominal muscles more in latter stage of the sit-up, 
encourages pronounced spinal flexion 

(6) head, neck and back held straight throughout— good as an 
advanced and difficult form, difficult to judge and score 

(7) no support on feet— makes the test very difficult, may result 
in many zero or low scores 

(8) legs straight, heels on floor— there is a tendency to hollow 
the lower spine excessively, the hip flexors do more of the work 
than when the knees are bent 

2. ROCKER 

Description: Start in back lying position, hands on the shoulders 
with elbows extended straight sideward. Raise the feet and trunk 
off the floor a few inches so as to be supported only on the hips. 
Rock gently from side to side while holding ’.his position. Sideward 
movement is checked by contact of the elbow on the floor, a push-off 
with the elbow is permitted. Score is the number of seconds the 
position and action are maintained. 

Suggestions: Continue regular breathing for greater comfort. 

Variations are sometimes used: 

(1) V position held with no rocking movement, arms extended 
just off the floor. Rocking gives the advantage of momentarily, 
moderately relaxing the muscles on alternate sides of the abdominal 
wall; therefore, it slightly prolongs the time and gives better dis- 
tribution of scores. In the V position, the extended arms are hard 
to keep off the floor or thighs. 

(2) V position, held with hands holding the thighs. The position 
is fixed and the abdominal muscles are almost entirely relieved of 
action. It becomes primarily a balance test. 

Test of Foot Strength 

1. BOUNCING 

Description : Take a full squat position with the knees fully 
flexed and clasp the arms around the legs in such a way that the 



124 Better Teaching Through Testing 

knees can not extend. Bounce continuously in place by extending 
the ankles, using enough force to come just off the floor. Continue 
as long as possible. Score is the number of bounces which can be 
made at the rate of 120 per minute. If balance is lost or arms un- 
clasped, resume position and continue, providing not more than 3 
counts are lost. Deduct the number lost from the total count. 

Suggestions: Cross the arms and grasp near the elbows, otherwise 
the knees and hips extend. All force should come from the ankles 
and feet. If bouncing is low there is less tendency to try to use 
the knees. 

Counting should be done centrally. Use a metronome or drum 
to keep the rhythm. Count so all may hear and the student notes 
the count on which she quits, subtracting any lost counts before 
recording her score. Partners may do the scoring. 

TESTS OF ENDURANCE 

Endurance is the most difficult aspect of fitness to 
measure. Endurance is primarily the result of a physiological ca- 
pacity of the organism to continue functioning satisfactorily. En- 
durance may be either the ability to maintain action at maximum 
speed for a short period of time, or the ability to maintain action at 
a slower rate for an indefinite period of time. The former type of 
endurance is the easiest on which to set up tests, but it does assume 
that the subject will put forth the effort to work at a maximum rate. 

The tests listed here are considered the best of those now in use, 
when the criteria presented in the first of the chapter are applied. 
Since cardio-respiratory function is indicative of general organic 
endurance, such tests are frequently used. Only one such test is 
listed here since they are not practical for most physical education 
teachers. The other tests aim at measuring endurance to perform 
at a high rate for a comparatively short period. 

The first test may be given to one-third the class at a time. The 
latter two are limited, usually to one or two students at a time, be- 
cause of equipment required, or the nature of the test. Choose one . 



Evaluation of Physical Fitness 125 

1 . CHAIR CLIMBING 

Use ordinary chairs; arm chairs from a class room are suitable. 
The only requirement is that all are of the same height for any 
groups to be compared. The number required is one-third that 
of the students to be tested. 

Work in groups of threes. One performs, two holds the chair 
and counts aloud the number of times one mounts the chair. Three 
holds the chair and writes the scores on an individual score card. 
One stands in a position of readiness beside the chair with one foot 
on the chair, and holding two’s left hand with her own right 
hand. On the signal. Ready, Go! she rises to an erect standing 
position with both feet on the chair. Immediately she steps down 
to the floor with the same foot which she started on the floor. She 
continues as rapidly as possible until the final whistle. She may 
change feet occasionally by making the shift while both feet are 
on the chair. She may hold on to two’s hand throughout the test. 

The timing is done centrally by one timekeeper, who has a stop 
watch and two whistles with distinctly different sounds. (A timer’s 
horn works very well for one.) The timer gives the starting signal, 
at the end of 5 seconds she blows a whistle, again at the end of 20 
seconds she blows the same whistle, then at the end of 50 seconds 
she blows the second whistle which is the final signal. T wo counts 
continuously from “Go” to the final whistle. Three listens for all 
three whistles and immediately after each one writes the count she 
heard simultaneously with or just preceding the whistle. The three 
then change places and the test is repeated twice so each one per- 
forms. 

The endurance score is the ratio between the score for the 15 
second interval and the score for the 45 second interval. These are 
obtained by subtracting the count for the first 5 seconds from that 
at 20 seconds and at 50 seconds. The 45 second score is then divided 
by that for the 15 seconds. If the 15 second score is less than 16, 
then 16 is used as the divisor. 

Suggestions: The assistants should hold the chair firmly to pre- 
vent its slipping. One grasps two’s hand as a safety measure to 
prevent loss of balance or unsteadiness when she becomes tired. 

The performer can postpone the fatigue of the leg muscles if she 
alternates frequently die foot left on the chair. She will save time 



128 Better Teaching Through Testing 

the results. However, the heart rate is so variable and dependent 
upon so many factors that the value of this test would seem to be 
reduced as much or more by occasional lack of effort. 

TESTS OF AGILITY 

1. OBSTACLE RACE 

The race which is described in Chapter 6, p. i 36, may be used 
as an agility test, particularly after practice has been allowed on 
the race as a whole, or on its parts. If given at the beginning of the 
year, a single administration may serve the purpose in both bat- 
teries. 

2 . SHUTTLE RACE 

This race is described on p. 149, and may also be used for this 
purpose. It will be a little more economical of time than the ob- 
stacle race since it can be given to several at the same time. 

TESTS OF FLEXIBILITY 

Flexibility must be considered in terms of a given 
joint or of adjacent joints, just as strength is considered with refer- 
ence to specific muscle groups. Flexibility is desirable only as it 
contributes toward some other ability or to freer movement. Deter- 
mine the type or types of flexibility desired and choose one test for 
each type selected. 

Tests of Hip and Back Flexion 

1 . STANDING, BOBBING 

Arrange a scale marked in half inch intervals on the front of a 
chair, stool, or platform with the inches marked above and below 
the level of the chair surface. The scale should be not more than 
3 to 4 inches wide, and the chair must be stable. 

Description: Stand with toes even with the front edge of the 
chair and against the sides of the scale. Let the arms and trunk 
relax and hang forward, fingers in front of the scale. Then bob 



Evaluation of Physical Fitness 129 

downward forcefully three or four times reaching equally with the 
Ungers of both hands. The knees must be kept straight. Score is 
the lowest point reached in the series of bobbings. 

Suggestions: A 20-inch scale may be marked from top down- 
ward attached so that 10 represents the level of the top of the chair. 
This usually provides measurement for the range of flexibility 
found in a class. If the suggested standard of performance is for 
some point above or below the chair level this is the best arrange- 
ment. 



Figure ay. Standing Bobbing Test 


The other alternative is a similar length scale, but the chair 
level is zero with deviations progressing upward or downward 
from that point. The score is then minus if the reach is short of 
the stool level, or plus if it is beyond that level. If the accepted 
standard is the ability to reach the surface on which one stands, 
this is a very descriptive form of scoring and is preferable. 

The test may be scored as the reach which the performer can 
attain and hold for two or three seconds. This will slightly reduce 
the scores, and will make it easier to score. 

In order to score accurately, the person giving the test must have 
the eyes down on a level with the reach. A student assistant is 
helpful. She can watch for bent knees and also be ready to give 
support if the performer loses her balance. 



i go Better Teaching Through Testing 

2 . SITTING, BOBBING 

Description: Sit on the floor with the legs straight and extended 
at right angles to the line of the boards in the floor. Place the heels 
on a crack between the boards and with feet about five or six 
inches apart (just wide enough to get the hands between the heels). 
A partner stands with her feet against those of the performer to 
prevent the latter from slipping. The performer bends forward 
reaching as far forward as possible on the floor. The knees must 
be kept straight. Score is the farthest point reached by the finger 
tips. 

Suggestions: This may be given as a mass test and since its main 
advantage over standing bobbing is for quickness of administration 
it would seem advisable not to use special floor markings. If suffi- 
cient rulers are available they may be held in place by one heel of 
the standing partner. The zero point is then the heel line and the 
score is plus if the reach is beyond the heels or minus if short of 
them. 

The most satisfactory means is by using the boards of the floor 
as units of measurement. Most gymnasium floors are laid with 314 
inch boards. The score is measured to the nearest half of the board 
(in other words to the nearest inch). For example, if the reach is 
almost to the middle of the second board beyond the heel, the 
score is -j- \ \/%. If the reach is almost to the second crack beyond 
the heel the score is -f- 2. If narrower boards are used, count to the 
nearest crack. 

A variation of these which is sometimes used is done in a sitting 
position with the legs straight and together. Bend forward to touch 
the forehead to the knees. If the head touches, spread the legs just 
enough to get the head between the legs. Bend forward to touch 
the forehead to the floor. This method has the advantage of elimi- 
nating all direct measurements and of reducing the influence of 
arm length. It also includes neck flexion which is omitted in the 
other forms. However, it leaves most of the class in one of two 
groups, those touching the knees, and those not touching. There 
will be a third, comparatively small group who reach the floor. 
This may be adequate differentiation and is faster, each one scor- 
ing her own performance. 

These tests all measure almost identical forms of flexibility. How- 



Evaluation of Physical Fitness 131 

ever, the standing scores will always run a little higher because of 
the more effective use of gravity and because the hips are shifted 
back of the heels when standing, thus shortening the distance to 
the feet. They are measures not only of back and hip joint flexi- 
bility but also of elasticity and relaxation of the hamstring (pos- 
terior thigh) muscles. This is doubtless of more importance as a 
basis for success in certain activities such as dance or tumbling 
rather than in every day activities. It also seems to be related to 
ease and economy of muscular effort in many movements. 

Students who take flexibility tests sometimes comment on the 
disadvantage which they have because of the shortness of legs or 
arms. Teachers who are considering the use of the tests are also 
prone to think of body build as a determining factor in the scores 
obtained. With this in mind anthropometric measures were taken 
on a hundred college women acting as subjects in a flexibility 
study.* There is no evidence from these data to indicate that varia- 
tions of body build as found in an average group would affect flexi- 
bility scores unduly. The correlations with the standing bobbing 
test (p. 1 28) and the measures which might appear to affect bobbing 
ability follow: 


Height +157 

Trunk length + .148 

Arm length -)- .294 

Trunk and arm length. . . ■ • • + -297 

Ankle flexion -f- .178 

Spinal extension +■ .262 


The correlations between spinal extension (described in the 
following test) and various measures are equally low. 


Shoulder flexibility 4 - .262 

Trunk length + .180 

Pull strength + - * 3 ® 


Test of Extension in Upper Back 


I . SPINAL EXTENSION 

Description : Lie in a straight prone position on the floor or a 
table. Hands are clasped together above the hips. Raise the head 
and shoulders from the floor by arching the upper back; pull with 

• Wilson, Marjorie, and Scott, M. Gladys: A Study o£ Flexibility in Relation to 
Physical Education Activities, unpublished study. 



i g* Better Teaching Through Testing 

the arms keeping the lower comer of the ribs on the floor. Score 
is the vertical distance from the suprasternal notch (top of the 
sternum) to the floor. 

Suggestions: Fixation of the ribs on the floor is best assured by 
an assistant who places a finger on the lower points of the rib 
cage where contact is to be maintained. The assistant can ask for 
adjustment of position if she arches so much in the lower back that 
the ribs leave the floor. 

Measurement can be made most easily by placing one end of a 
string on the suprasternal notch when she starts to lift. The string 
is pulled taut and straight to the floor (vertically) while at the top 
of her extension. Measurement of the string from finger tip to 
finger tip is then made on a ruler. The score is read to a scorer. 

Measurement is most rapid if students lie down side by side, 
always with one or two ready ahead of the one being measured. 
When each girl is measured she may go on to the next test or to 
practice. 

This type of flexibility facilitates many movements. A reasonable 
degree is also important in maintenance of good posture. To a 
slight extent it also measures strength of spinal extensors since the 
body must be lifted and held there momentarily for measurement. 

Range in trunk twisting is also desirable in many activities but 
measurement techniques are highly unreliable. Measurement 
should be in a sitting position if movement is to be limited to the 
spine, standing position if hip rotation is to be included. Variations 
in shoulder girdle action also add to the unreliability, and vari- 
ability from one individual to another. 


BIBLIOGRAPHY 

i. Bookwalter, Karl W.: "Test Manual for Indiana University Motor Fitness In- 
dices for High School and College Men,” Research Quarterly, 14, December 
>943- P- 356 

st. Brouha, Luden: “‘The Step Test: A Simple Method of Measuring Physical 
Fitness for Muscular Work in Young Men,” Research Quarterly, 14, March 
»943. P- S 1 

3. Brouha, Lucien, and Gallagher, J. Roswell: “A Functional Fitness Test for 
High School Girls,” Journal of Health and Physical Education, 14, December 
»943- P- 5i7 

4. Cassidy, Rosalind, and Kozman, Hilda Clute: Fitness First, A Physical Fitness 
Workbook for High School Girls. A. S. Barnes & Company, New York, 1943 



Evaluation of Physical Fitness 133 

5. Clarke, Harriet: “A Functional Physical Fitness Test for College Women," 
Journal of Health and Physical Education, 14, September 1943, p. 358 

6. Cureton, T. K,: Physical Fitness Workbook. Stipes Publishing Company, Cham- 
paign, Illinois, 1944 

7. Espenschade, Anna: “Report of the Test Committee of the Western Society of 
Departments of Physical Education for Women in Colleges and Universities,” 
Research Quarterly, 14, December 1943, p, 397 

8. Hall, D. M., and Wittenborn, J. R.: "Motor Fitness Tests for Farm Boys,” Re- 
search Quarterly, 13, December 194s, p. 43s 

9. Havlicek, Frank J.: "Speed Sit-ups,” Research Quarterly, 15, March 1944, p. 75 

10. Kistler, Joy: “A Study of the Results of Eight Weeks of Participation in a 
University Physical Fitness Program for Men,” Research Quarterly, 15, March 
1944 . P- *3 

11. Leighton, Jack R.: "A Simple Objective and Reliable Measure of Flexibility," 
Research Quarterly, 13, May 194s, p. 205 

12. Miller, Ben W., Bookwalter, Karl W., Schlafer. George E.: Physical Fitness 
for Boys. A. S. Barnes & Company, 1943, Chapter X 

13. Minnesota State Defense Council: "Physical Fitness and Recreation Program,” 
Bulletin No. 2, Series B, St. Paul, Minnesota 

14. Mohr, Dorothy: Measured Effects of Physical Education Activities on Certain 
Aspects of the Physical Fitness of College Women, Research Quarterly, 15, 
December 1944, p. 340 

15. National Association of Physical Education for College Women: Proceedings 
of Victory through Fitness Workshop, 1943, pp. 43-48 

16. O’Connor, Mary Evangeline: Motor Fitness Standards for High School Girls. 
Unpublished M.A. thesis. University of Illinois, Lrbana, 1944 

17. Phillips, Marjorie, Ridder, Eloise, and Yeakcl, Helen: "Further Data on the 
Pulse Ratio Test,” Research Quarterly, 14, December 1943, p. 425 

18. Research Committee, National Section on Women’s Athletics: "Physical Per- 
formance Levels for High School Girls,” Journal of Health and Physical Edu- 
cation, 14, October 1943, p. 424 

19. Springfield College Studies: "Physical Fitness," Research Quarterly, 12, Sup- 
plement May 1941 

so. Steinhaus, Arthur H, Hawkins, Alma M., Giaque, Charles D., Thomas, Ed- 
ward C.: How to Keep Fit and Like It. George Williams College, Chicago, 1943 

21. United States Office of Education: "Physical Fitness through Physical Educa- 
tion, Victory Corps Series," Pamphlet No. 2. Superintendent of Documents, 
U. S. Government Printing Office, Washington, D. C., 1943, p. 81 

22. United States Office of Education: “Physical Fitness for Students in Colleges 
and Universities.” Superintendent of Documents, U. S. Government Printing 
Office, Washington, D. C., 1943 

23. United States Office of Education: "Scales for Tests for High School Boys of 
Strength of the Abdomen and Back,” Education for Victory, 3, August 21, i 944 > 
P -4 

24. United States War Department: WAC Field Manual, 35-20; Physical Train- 
ing. Superintendent of Documents, U. S. Government Printing Office, Wash- 
ington, D. C., 1943 

25. Wilbur, Ernest A.: "A Comparative Study of Physical Fitness Indices as Meas- 
ured by Two Programs of Physical Education: The Sports Method and the 
Apparatus Method,” Research Quarterly, 14, October 1943, p. 326 

26. Wilson, Marjorie: “A Study of Arm and Shoulder Girdle Strength of College 
Women in Selected Tests,” Research Quarterly, 15, October 1944, p- 258. 



6 . 


Measurement of General Motor 
Ability 


MOTOR ABILITY DEFINED 

Every teacher knows that some students learn very 
much more rapidly and with less apparent effort than other students. 
It is also well known that after an interval of instruction and 
practice some students have a greater variety of skills and greater 
proficiency in them. This is frequently explained rather vaguely as 
a difference in motor ability. This is actually true, and since it 
affects the teaching-learning situation so deeply it seems worthy of 
careful consideration. 

Writings by various authors have led to the use of several similar 
terms with different connotations. This has sometimes been con- 
fusing; therefore, discussion of these terms follows. 

Motor educability is the inherent aptitude (motor and mental) 
for learning new skills quickly and effectively. In carrying out this 
concept it is understandable that tests of this characteristic should 
involve motor problems new to the subject; that they be presented 
through the usual media of instruction, verbal description and dem- 
onstration; that they prohibit preliminary practice, and allow 
very few trials; that they be of a success or failure type. Therefore, 
most of the tests which are proposed for measuring educability are 
of a stunt type, and usually include several stunts in order to secure 
satisfactory reliability. The greatest difficulty is in devising motor 
problems which are new to the performer. 

Motor capacity is very similar to educability but is really a little 
broader. Such batteries may contain an educability test plus general 
agility items such as obstacle, or dodge runs, or Burpee tests. 

Physical capacity implies a fitness or capability for performing 
motor activity. Since that capability is dependent upon several 

>34 



Measurement of General Motor Ability 135 

things the tests vary from physiological functioning of circulatory 
and respiratory systems, to strength, or reaction time. Such tests 
will not be considered here as they are partially taken care of by 
proper medical examination, and partially in fitness testing. 

The term motor ability is sometimes used to mean achievement 
in basic motor skills; or it may be interpreted as a more general 
term combining the concepts of motor educability and achieve- 
ment. How successfully achievement and educability can be sepa- 
rated is still an unsolved question. Motor ability measurement is 
usually concerned with some form of running, throwing, and jump- 
ing; tests are repeated from time to time, and practice on them is 
permitted. The level of ability recorded may be due to capacity 
for neuro-muscular coordination, to practice, to strength, or other 
less evident factors. 

It seems that the information needed by the teacher concerning 
the student, at least from the junior high school age up, is aptitude 
for learning, ability in the fundamental skills, and ability in the 
various sports or activities. The first two points are of a general 
nature and can be interpreted in relation to any activity. Further 
ability is specific for each activity. In every case it is necessary to 
abbreviate testing as much as is possible and to relate them when- 
ever possible. 

REQUIREMENTS FOR A MOTOR ABILITY TEST 

Since it is very difficult to separate the measurement 
of aptitude and of achievement, and for the sake of abbreviation of 
the testing procedures, it would seem advisable to follow the second 
interpretation of motor ability stated above and consider them as 
dual and interdependent aspects of general motor ability. Let us 
consider the requirements for such a set of measures. 

(1) First of all it would be necessary to have unusual situations, 
or motor acts relatively new to the subjects. 

(2) Students should not practice on the test as such. 

(3) It is essential that students be given a clear idea of the prob- 
lem presented by the test but that should not include specific 
coaching or instruction on techniques to be used. 



ig6 Better Teaching Through Testing 

(4) Principal activities in the physical education program should 
be analyzed for the skills that they have in common, for example, 
balance and weight control, eye-hand coordination, strength, agility, 
speed, etc. The tests should be set up to include as many of these 
as possible. 

(5) Tests combining more than one element in a significant way 
should be used when possible. 

(6) Part of the test should give opportunity to demonstrate skill 
developed by those who have worked hard previously. 

(7) The tests should not put undue emphasis on endurance, 
strength or any other one factor. 

A MOTOR ABILITY BATTERY 

The motor ability test battery presented here has 
been successfully used both with college women and high school 
girls. 1 * * * * * * * * * * 12 Let us consider this battery in light of the above criteria 
which are peculiar to this type of test and supplement those criteria 
discussed in Chapter Two. 

The minimum battery recommended is the obstacle race, basket- 
ball throw and standing broad jump. The 4-second dash and wall 
pass may be added or substituted for the obstacle race. 

1. Obstacle Race 13 

The space needed is 55 feet by 12 feet; equipment needed, three 

jump standards and a cross bar at least 6 feet long; lines on the 

floor. (See Figure 28.) 

Description: Subject starts in a back lying position on the floor 

with the heels at line a. On the signal, Ready, Go! get up and start 

running toward J, as you come to each square on the floor step on 

it with both feet. Run twice around J, turn back to d, go under 

the cross bar, get up on the other side, run to line c and continue 

running between lines b and c until you come to c for the third 

time. Score is the number of seconds (to the nearest .1 second) 

that is required to run the course. 

12 See references in bibliography at end of this chapter for report on college use. 
For high school use: Smalley, Jeannette, and Scott, M. Gladys: Motor Ability Tests 
in Junior and Senior High School, unpublished study. 



« jO U-Oi-tOJ | 



Figure a 8. Floor Markings and Pathway for Obstacle Race 

— starting line 

— line for tie shuttle 

— finish line 
-cross-bar (18" high) 

— jump standard 

— spot on floor (is" x i8") 

— path o£ runner 

distance from end of cross-bar to line of inner sides of spots, 4’ 4" 



138 Better Teaching Through Testing 

Suggestions: Give instructions to all so they need not be repeated 
when individuals are ready to run. Demonstrate what is meant by 
stepping with both feet on each square. 

Each successive runner should lie down as soon as the girl ahead 
is up. This avoids delay in starting new runners. 

If two timers and watches are available, the next girl starts as 
soon as the one ahead finishes circling the standard. Approximately 
twice the number can be scored on the same course with this ar- 
rangement. 

Do not call the runner back if the toe or heel extends outside of 
the square. Some feet are too large to fit inside the square if the heel 
is lowered. Judge on whether the stride is adjusted to contact the 
square and whether there is a transfer of weight in the square. 

2. Basketball Throw for Distance 

Space needed is about 80 feet long and 20 feet wide, a throwing 
line marked about 8 feet from one end of the course and parallel 
lines every 5 feet beginning 15 feet in front of the throwing line. 

Description: Start anywhere you wish behind the throwing line, 
but do not step on or across the line when throwing. Throw in 
any way you wish, three consecutive times. The score is the distance 
from the throwing line which the ball travels before touching the 
floor. Only the longest throw counts. 

Suggestions: Explain carefully but do not demonstrate. Answer 
questions about the test except those on throwing technique. If 
asked whether the throw should be overhand or underhand, 
whether from a stationary position or with a step or run, simply 
reply that the throw may be of any type, providing the feet are kept 
behind the line; the purpose is to throw the ball as far as possible. 
This may not be good teaching procedure but it is essential for this 
form of testing if you wish to know how the player is apt to meet 
similar problems of throwing in a game. 

It is true that some will profit more than others from seeing other 
students perform, but they are also the ones who learn quickly 
from class instruction. The ones who do not profit from errors and 
success of classmates, doubtless will be slow to profit from class 
instruction. 



Measurement of General Motor Ability 1S9 

If the gymnasium is too short and the test can not be given out- 
side, a diagonal course across the gymnasium may be used. This 
insures sufficient distance in practically any gymnasium but leaves 
little space in which to carry on other class activities during the 
test. 

3. Standing Broad Jump 

If given outside it is necessary to have a jumping pit with sunken 
take-off board within 30 inches of the edge of the pit. If given in- 
doors it requires mats at least 714 feet long and a solid board at 
least 2 feet long (beat boards used with apparatus are excellent) 
placed against the wall to prevent slipping. If the mat is marked in 
2-inch intervals it eliminates the need to measure each jump with 
a tape. 

Description: Performer stands on the take-off board, toes may be 
curled over the edge of the board. The take-off is from both feet 
simultaneously, the jump is as far forward on the mat as possible. 
The score is the distance from the edge of the take-off board to the 
nearest heel (or to the nearest part of the body if the balance is 
lost). The best of three trials will be counted. 

Suggestions: Preliminary swinging of arms and flexing of knees 
are permissible providing the feet are kept in place on the board 
until the actual take-off. 

Be sure performer understands what is to be done. 

When the use of a take-off board is not feasible, jumping may 
be done from the mat if the mat is heavy enough that it will not 
slip. 

The following two tests may be added if time permits the ad- 
ministration of the additional items. The obstacle race need not 
be given if these are added. 

4- Wall Pass 

A flat wall space is necessary at least 8 feet square. A line is drawn 
on the floor parallel to the wall and 9 feet from it. Several such 
spaces are desirable in order to test several persons at one time. 
Timing for all areas may be done by a single timer. 



140 Better Teaching Through Testing 

Description: Stand facing the wall, behind the line. Throw the 
ball against the wall, catch it when it comes back and repeat again 
as quickly as possible. Stay behind the line all the time. The throw 
may be of any type and the score is the number of hits on the wall 
in the time allowed (15 seconds). 

Suggestions: The test may be administered very quickly if several 
testing areas are available and about four players start at each area. 
The first one is tested while the second counts the hits and watches 
the foul line. Then the first reports the score while the second is 
tested and the third counts. This is repeated until all are finished. 
A player who steps across the line slightly should be called back 
by the player scoring the trials. If the feet are in the proper position 
for the next throw, the error is not considered. If fouls are con- 
tinuous the entire trial must be repeated. 

Allow each person time for three or four practice throws before 
taking the test. 

If a ball drops between the wall and the line it may be necessary 
for the player to cross the line to recover the ball. However, the 
next throw must be made from behind the line. 

One trial is usually sufficient on this test. Second trials should 
be repeated, however, in case of interference of any type or in case 
the ball gets entirely out of control. It should not be repeated simply 
because of fumbling. If time and facilities permit administration 
of two or three trials for all, the higher score should be used. 

5. Dash (4 seconds) 

It is desirable to have a straight course at least 85 to 90 feet long 
and 4 feet wide. It may be laid out diagonally across the gymnasium 
if space is too short otherwise. The starting line should be at least 
3 feet in front of the wall. The course is marked in one-yard zones 
beginning at 10 yards from the starting line to about 27 yards from 
the starting line. The additional distance allows the runner space 
in which to stop. 

Description: Start in any position you wish with the toes behind 
the starting line. On the signal, Ready, Gol start running as fast 
as possible and keep going as fast as possible until the whistle blows. 
You may run as far as you wish after the whistle sounds (at the end 



Measurement of General Motor Ability 141 

of 4 seconds). The score is the distance you have run between the 
starting signal and the whistle. 

Suggestions: One trial is sufficient unless there is outside inter- 
ference. 

The judge on this should be carefully trained. It is best to use 
two persons, a timer and a judge. The timer starts the runner and 
blows the whistle. The judge determines the zone into which the 
foremost part of the body extends when the whistle blows. One 
person may assume both responsibilities after training. In this case, 
the watch is extended forward so the watch and the runner are in 
a straight line of vision at the end of the 4 seconds. 

The judge should attempt to be parallel with the runner at the 
final signal. With very little experience the judge will learn by 
looking at the runner during the start and the first few strides 
whether the finish will be short, around 13 to 15 yards; long, 
around 23 to 25 yards; or somewhere in between. 

If space permits more than one lane, additional lanes should be 
used providing sufficient judges can be obtained. (See p. 37.) 

SCORING THE BATTERY 

T-scales for each of the tests and for either battery are 
shown in Tables XI, XII, XIII. The scores for high school girls 
may be read from Table XI, those for college women from Table 
XII. Table XIII is for professional students (majors) in physical 
education. 

The composite on these batteries may be computed in either of 
two ways. The simplest and quickest is to take the average of the 
T-scores earned on the three or four tests given. For example, if a 
student’s T-scores are obstacle, 58; basketball throw, 62; broad 
jump, 60; then the composite score representing the level of motor 
ability is the average of the three T-scores, or 60. 

The second method uses the regression equation derived from 
the multiple correlation. For the three tests the equation is: 

2. basketball throw -j— 1 .4 broad jump — 1. obstacle race 

If the four tests are used without the obstacle race the equation 
reads: .7 basketball throw -j- s - dash -(- 1. passes -(- .5 broad jump 

The actual score may be multiplied by the proper weighting and 



TABLE XI 

for Motor Ability Tests for High School Girls 


142 


Better Teaching Through Testing 


gg.S'RK 2SR.S.R g'gg-jg.i? $£&&& gg.S'&S 


1/5 Ow 


»fi — r*» n oinn n h ciNion 

W Of - -O O So 000 CO 00 j> t^C© %0 to <0 

Of Of Of Of Of Of P4 M - — - - - - « M P< Mf HI 

Sss 1 wsit-S- 14^5 


ES** 

<otq^^ 


n ots in to — o> t** into — 00 
00 00 <*> 00 00 of of or Of «f - 

*4 — w* — — m — — — P4 H — M — 

(ON oooto ^ M 600 CO ^ or 600 

^toco^to pon or of or 01 - 


Oi-^O ?5— 0» h-ifiCPl f*lftCO«<.t* ©> t"- tO 

000 cn O') O) 00 ®oo C" C* 1 5 T co co to 1© t© 

£sl> & &§. £ lit tttii £££>&& 


I ms 


CD ** Of 6 

Of Of Of Of 




t- co 

»r> 


*J®S 

'w' 


M 01 

Of 

Of 

ini 

5 

to m — 04 

01 0 o>d5 

oo* 

10 oo — 
00 ao 00 

^ w 


S 8> 8,3 

s 

oo* 00 <3 

Isis® 

r^s 

— 

SStf .S'S’S 88 

5 2© »© 

as ‘SS'S 


r- r> 

*& s 


23 

^30 CO to *< «* or - 

— — — — — *4 

w 

»i| 5 &¥.&¥.£ SSg-Rg. g'g.S-g^ S'SsS'SjS 



Measurement of General Motor Ability 



m U v * 



144 


Better Teaching Through Testing 


TABLE XII 

T-Scores for Motor Ability Tests for College Women 


Basket - 


Wall 
T- Pass 
Score (1187) c 

ball 

Throw 

(ft-) 

(1162) * 

Broad 
Jump 
(in.) 
(1167) * 

4-Second 
Dash 
(yds.) 
("73 ) ' 

Obstacle 
Race 
(sec.) 
("30) • 

G.M.A. (a) 

( 1880 ) ' 

G.M.A. (b) 
(1228) • 

r- 

Score 

84 

18 







84 

83 


75 

84-85 

*9 

17.0-17.9 



83 

82 





18.0-18.9 



82 

81 



82-85 




206-207 

81 

80 


65 



19.0-19.9 

146-147 

204-205 

80 

79 

17 






202-203 

79 

78 


64 

80-81 


20.0-20.4 

144-145 

194-201 

78 

77 


63 


28 


142-143 

i 9 *-i 93 

77 

76 

16 

62 




140-141 

190-191 

76 

75 


60 




138-139 

188-189 

75 

74 

15 

59 




136-137 

186-187 

74 

73 


58 

78-79 


20.5-20.9 


184-185 

73 

72 

14 

57 


27 


134-135 

182-183 

72 

7 i 


56 

76-77 



iS*-i 33 

180-181 

7 ‘ 

70 


55 



21.0-21.4 


178-179 

70 

69 

*3 

54 

74*75 



130-131 

176-177 

69 

68 


53 




128-129 

170-175 

68 

67 


5 * 


26 

21.5-21.9 


168-169 

67 

66 


5 i 

7*-73 



126-127 

164-167 

66 

65 


5 ® 



22.0-22.4 

124-125 

162-163 

65 

64 

12 

49 

70-71 

*5 


122-123 

160-161 

64 

63 


47 



22.5-22.9 


156-159 

63 

62 


46 

68-69 



120-121 

i 5 *-i 55 

62 

61 

11 

45 



23-0-23.4 

118-119 

150-151 

61 

60 


44 

66-67 

*4 


116-117 

148-149 

60 

59 


43 



* 3 - 5 -» 3-9 


146-147 

59 

58 


4 i 

64-65 



114115 

142-145 

58 

57 


40 


*3 

24.0-24.4 

112-113 

140-141 

57 

56 


39 





i 38-»39 

56 

55 


38 

62-63 


* 4 - 5 -* 4-9 

110-111 

134-135 

55 

54 

10 

37 




108-109 

13 *- 1 S 3 

54 

53 


36 

60-6 1 

22 

250-25.4 


130-131 

53 

5 * 


35 




106-107 

126-129 

5 * 

5 i 



58-59 


* 5 - 5 -* 5-9 

104-105 

124-125 

5 * 

50 


34 





122-123 

5 ° 





Measurement of General Motor Ability 
TABLE XII (Continued) 

T-Scores for Motor Ability Tests for College Women 


Wall 
T- Pass 

Score ( n8y )* 

Basket- 
ball Broad 
Throw Jump 

(ft-) (*"•) 

(//<>2) * (ii6j) * 

4 - Second Obstacle 
Dash Race 

(yds.) (sec.) 

(ii73) ' (**3°) ° 

G.M.A. (a) 
(. 1880 ) c 

G.M.A. (b) 
(2228)' ; 

T- 
S core 

49 


S 3 

56-57 

21 

26.0-26.4 

102-103 

120-121 

49 

48 

9 

3 * 




100-101 

116-119 

48 

47 


3 1 



26.5-26.9 

98-99 

114-115 

47 

46 



54-55 

20 

27.0-27.4 

HO-113 

46 

45 


30 





108-109 

45 

44 



52-53 


27-5-27-9 

96-97 

106-107 

44 

43 


*9 





102-105 

43 

4 s 

8 

28 

50-51 

19 

28.0-28.4 

94-95 

100-101 

4 * 

4 * 


27 



28.5-28.9 

9*-93 

98-99 

41 

40 


26 

48-49 


29.0-29.4 


96-97 

40 

39 






90-91 

94-95 

39 

38 



46-47 

18 

29-5-29-9 


90-93 

38 

37 


*5 



30.0-30.4 

88-89 

88-89 

37 

36 

7 


44-45 


30.5-30-9 

86-87 

84-87 

36 

35 


24 


»7 



82-83 

35 

34 


*3 

42-43 


31.0-31.4 

84-85 

80-81 

34 

33 





31-5-31-9 

82-83 

78-79 

33 

3 * 


22 

40-41 



80-8 i 

76-77 

3 * 

31 

6 

21 


l 6 

32.0-32.4 


74-75 

31 

30 



38-39 


32-5-32-9 

78-79 

7*-73 

30 

*9 


20 



33 - 0 - 33-4 


70-71 

*9 

28 



3 6 -37 

15 

33 - 5 - 33-9 

76-77 

66-69 

28 

*7 

5 

»9 




64-65 

27 

26 


34-35 


34 - 0 - 34-4 

74-75 

62-63 

26 

*5 


18 

32-33 

14 

34-5 34-9 


60-61 

*5 

*4 





35 - 0 - 35-4 

72-73 

58-59 

*4 

*3 

4 

>7 

30-31 

13 



56-57 

»3 

22 


l 6 




7 o- 7 i 

54-55 

22 

21 


15 



37 - 0 - 37-4 


38-39 

21 

20 

3 



37 - 5 - 37-9 

68-69 


20 

»9 

t 





66-67 


19 

18 


13 




64-65 


18 

»7 

1 


*4-25 

12 




17 

16 





38.0-38.4 



l 6 

•5 





62-63 


15 


' =■•» basketball throw + a. dash + i. passes + .! broad jump. 

• = LSSM 'Z25jjiUg£ iTSied- Univent ty of low. student, over a five- 

year period. 


i46 


Better Teaching Through Testing 


TABLE XIII 


T-Scales for Motor Ability Batteries for Physical Education 

Major Students 


T-Score 

G.MA. (/) * 
(26))' 

G.Mu 4. (a) b 
(*78) ‘ 

T-Score 

G.M.A. (/) 1 
(* 63) • 

G.MA. (a)b 

79 

166-up 


45 

116-117 

144-147 

75 

156-160 

229-up 

44 


140-143 

78 

1 54* 1 55 


43 

114-115 


7 i 

15**153 

224-228 

4 * 

112-113 


70 

150-151 

220-223 

4 i 


136139 

69 

148-149 


40 

110-111 


68 


216-219 

89 

108-log 

i 3 *-i 35 

66 

146-147 

212-215 

38 


128-131 

65 


208-211 

87 

106-107 


64 

144-145 

204-207 

36 


124-127 

63 

i 4 *-i 43 

200-203 

35 

104-105 

120-123 

61 

140-141 

196-199 

34 


116-119 

60 

138-139 

19 S -195 

3 * 

102-103 


59 

136-137 

188-191 

3 i 

100-101 


58 

134-135 


3 <J 


112-115 

57 


184-187 

*9 

98-99 


56 

13*133 

180-183 

27 

96-97 


55 

130-131 

176-179 

26 

9*95 

108-111 

54 


17*175 

*5 

90-91 

96-107 

53 

128-129 


*3 

86-89 


5 * 

126-127 

168-171 

21 

84-85 

9*95 

5 i 


164-167 




50 

1*41 *5 

160-163 




49 

122-123 

156-159 




47 

ISO -121 

i 5*-‘55 




46 

118-119 

148-151 





■ ssi.j basketball throw + a. dash + i. panes + .5 broad jump. 
h =' a. basketball throw + 1.4 broad jump — 1. obstacle race. 

* = Indicates the number of subjects on which the scale Is based. University of Iowa students. 


these products added. It will be faster, however, to use Table XIV. 
For example, 

basketball throw 35 feet 

dash so yards 

passes 10 

broad jump 61 inches 

It is quicker to look at Table XIV and find the value of .7 of the 
basketball throw as 24.5 rather than to multiple it out for each 
case. Doubling the dash score or taking half of the broad jump can 





Measurement of General Motor Ability 147 

be done mentally; with little time and effort the total score can 
be added. Table XIV also provides a multiplication table of 1.4 
broad jump. 

TABLE XIV 

Multiplication Tables for Motor Ability Test Batteries 


j Item Battery 


4 Item Battery 


2.0 Basketball throw (feet) 
-j— 1.4 Broad jump (inches) 

1.0 Obstacle race (seconds) 


.7 Basketball throw (feet) 
s.o Dash (yards) 

-j- 1.0 Ball pass (times) 

-j- .5 Broad jump (inches) 


BROAD JUMP X 1.4 


BASKETBALL THROW X .7 


Raw score 

xi.4 

Raw score 

x 1.4 

Raw score 

*•7 

Raw score 

*•7 

3* 

44.8 

57 

79*8 

80 

14.0 

45 

31-5 

33 

46.8 

58 

8i.s 

81 

14.7 

46 

32.8 

34 

47.6 

59 

88.6 

88 

15-4 

47 

3*-9 




*3 

16.1 

48 

33-6 

35 

49.0 

60 

84.0 

*4 

16.8 

49 

34-3 

36 

50.4 

61 

85.4 





37 

51.8 

68 

86.8 

*5 

17-5 

50 

85° 

38 

53* 

63 

88.s 

s6 

i8.a 

5i 

35-7 

39 

54.6 

64 

896 

87 

18.9 

5* 

36-4 





s8 

19.6 

53 

37-i 

40 

56.0 

®5 

9>o 

*9 

80.3 

54 

37-8 

4» 

57-4 

66 

9*-4 




38-5 

4* 

58.8 

67 

93-8 

.8° 

21.0 

55 

43 

60.8 

68 

95* 

3* 

21.7 

56 

39* 

44 

61.6 

69 

96.6 

3* 

22-4 

57 

39-9 


33 

23.I 

58 

40.6 

45 

63.0 

70 

98.0 

34 

83.8 

59 

4>-3 

46 

64.4 

7 1 

99-4 



60 


47 

65.8. 

7* 

100.8 

35 

*4-5 

42.0 

48 

67.8 

73 

108.8 

36 

25.2 

61 

42.7 

49 

68.6 

74 

103.6 

37 

*5-9 

62 

43-4 




38 

86.6 

83 

44* 

50 

70.0 

75 

105.0 

39 

*7-3 

64 

44.8 

5 1 

71.4 

76 

106.4 


s8.o 

65 


5* 

78.8 

77 

107.8 

40 

45-5 

53 

74.8 

78 

IO9.8 

4i 

88.7 

66 

46.2 

54 

75-6 

79 

110.6 

4* 

*9-4 

67 

46.9 



43 

30.1 

68 

47.6 

55 

77.0 

80 

118.0 

44 

30.8 

69 

48-8 

56 

78.4 

81 

i»3-4 






It would seem to make very little difference which method is 
used, that is the average of the T-scores or the weighting of raw 
scores, since they yield composites which correlate very highly. 



148 


Better Teaching Through Testing 


EVALUATING THE BATTERY 

Let us now analyze these tests and the method of 
administration which has been outlined. As measures of innate 
capacity and educability the following points seem significant: 

(1) The obstacle race presents skills relatively new as a speed 
event, yet within the range of experience so that there is no ques- 
tion as to the problem presented. 

(2) The obstacle race presents a sequence of movements which 
is a test of the person’s ability to remember directions and to adjust 
for the next movement while still performing a preceding one. 

(3) The obstacle race puts a premium on weight control, bal- 
ance, total body coordination, and agility. 

(4) By avoiding specific instructions on how to throw the balls 
or make the jump, the tests measure more adequately than would 
otherwise be possible the performer’s knowledge and ability or 
powers of observation acquired through previous training or ex- 
perience. 

As measures of achievement and general ability to perform, the 
following points should be noted: 

(1) The broad jump is related to leg strength, coordination of 
arms and legs, and an understanding of the use of effort and bal- 
ance with respect to one’s own body movement. 

(2) The basketball throw involves strength, coordination of body 
and arms, ball handling, and an understanding of the use of effort 
with respect to some other object. 

(3) The wall pass is primarily ball handling, including eye-hand 
coordination, speed of reaction, and an understanding of the reac- 
tion of balls at different angles and speeds. 

(4) The dash is considerably more than a pure speed event. 
Because of its brevity, the start is a very important element. The 
person who makes a good start and gets up speed quickly covers 
more distance. The person who gets a slow start does not have 
time to make up for that slowness. This is a matter of weight control 
and force as well as reaction time and is very similar to the situa- 
tions presented in most sports where there are many quick starts. 

(5) The parallel lines used for the basketball throw for distance 



Measurement of General Motor Ability 14 g 

give an advantage to those who can control the ball sufficiently to 
deliver it in a straight line. 

Since we have defined general motor ability as inclusive of both 
aptitude and achievement, the two being impractical and unde- 
sirable to separate, this battery would seem to be especially suited 
to measurement of that general ability. 

Disadvantages of the tests are not numerous and can be largely 
overcome by proper administration. 

(1) The tests must all be administered individually except the 
wall pass. They may all be given simultaneously if there is sufficient 
help, or one each on successive days and the rest of the space used 
for regular class activity. Most of the mass tests which are some- 
times used require one or two class periods to give and are, there- 
fore, no more economical. If there is sufficient assistance the tests 
may be given as a part of the physical examination though the 
students usually do not get sufficient warm-up under these circum- 
stances. 

(2) Students may learn about the tests and practice on them. 
This can be prevented by giving them to successive classes without 
leaving intervening class periods with opportunities for practice. 

SUBSTITUTIONS IN THE BATTERY 

Other tests may be substituted for these if facilities 
prohibit, or for any other reason these seem inadvisable. Suggestions 
for substitutions are as follows: 

1. Shuttle Race 

Form 1. Parallel lines 1 5 feet apart. Score is the number of times 
the performer can cross between the lines in 15 seconds. 

Form 2. Parallel lines 25 feet apart marked in 5-foot zones. 
(See Figure 29.) Work in partners. 

Description: Start at line X, run to line Y, change direction and 
return to X. Repeat this as many times as possible in 30 seconds. 
Record the number of times your partner runs each length of the 
course and note the letter of the area the runner reaches at the 
end of the 30 seconds. (Example, 10-B) 



150 Better Teaching Through Testing 

Suggestions: It is important that the runner wait until the 
whistle blows, and that lines X and Y be touched each time, and 
that the scorer record the area that the runner is in at the second 
whistle. 

The shuttle race may be substituted for the obstacle race or the 
dash. 


Y 


£ 


D 


C 


B 


A 


X 

Figure 29. Floor Markings for Shuttle Race 
X — starting line 

X-Y — lines on which to reverse direction 
A, B, C, D, E — zones in the shuttle area 

2. Jump and Reach 

Stand facing the wall, toes touching with both hands raised 
overhead. Reach evenly with both hands and mark the height of 
the reach. Then turn sideward to the wall, jump and reach with 
one hand touching as far up the wall as possible. The score is the 
difference between standing and jumping reach. If the wall is such 
that water will show on it and not leave the wall disfigured, the 



Measurement of General Motor Ability 151 

easiest method is simply to dip the fingers in water when starting 
the test and then measure promptly. I£ this is not feasible, short 
pieces of chalk must be used. 

With college women the original reach may be measured more 
accurately if they stand with backs to the wall. The assistant 
marks the reach. 

This test may be substituted for the broad jump if mats or 
pit are not available for the jumping. 

3. Sand Bag Throw 

Use sand bags 4 inches square weighing 1 pound each, and with 
a string tied tightly around the middle of each to prevent its being 
thrown so that it will “sail” through the air. The administration 
of the test is identical with the basketball throw for distance in 
all other respects. 

This test may be substituted for the basketball throw for dis- 
tance where balls are lacking or the space is short. 

USING THE MOTOR ABILITY SCORES 

The motor ability score is used to section classes, to 
arrange squads or teams, or to determine the level of achievement 
expected in future work, and the amount of individual help which 
will be needed to achieve satisfactory results. For that reason the 
tests should be given at the beginning of certain phases of the 
student’s physical education experience. This does not mean that 
it needs to be given every year. It is probably best to give this 
battery at the beginning of the junior high school years, at the 
beginning of the senior high school course, and when entering col- 
lege. The results will be used then for a two or three year period. 
Improvement doubtless occurs during this length interval but the 
relative standing of students will not change appreciably, unless 
some students practice specifically on these test items while others 
do not. 

One specific example of the value of general ability tests lies in 
the economy of time for later testing. Placement by the general 
test can be used in connection with every activity, and therefore 
permits a very much shortened battery at the beginning of each 



15* Better Teaching Through Testing 

activity season for classification in that activity. The general bat- 
tery gives a fairly good estimate of the level of ability to be ex- 
pected of a person who is just starting to learn a new activity. If 
later achievement in that activity shows markedly superior status 
from that predicted it can almost always be attributed to extra 
practice, to exceptional motivation, or unusual interest and effort. 
Likewise, distinctly lower status than that predicted can usually 
be found to be due to lack of effort and practice. 

Another use of the motor ability measures is in selecting the 
groups at both ends of the scale for special consideration. For ex- 
ample, the lowest 15 to 85 per cent in ability too frequently are 
simply spotted as dubs and left to shift for themselves as best they 
can in the class. The result is discouragement, dislike for the ac- 
tivity, and eventually lack of cooperation, not to mention the 
fact that they usually remain poor in skills. If those individuals 
can be given special help, preferable in classes by themselves for a 
time, most of them profit considerably. 11 They may improve on 
specific skills by special help, they may learn to analyze skills 
more carefully, to compensate for some of their shortcomings in 
capacity or aptitude, and above all they have the opportunity to 
practise skills in a sympathetic group where they do not feel 
unnecessarily self-conscious. 

The upper 15 to 25 per cent in ability can also profit by special 
consideration. When all levels of ability are taught in the same 
class, by the same procedure, at the same rate, the most capable 
ones are not challenged by the material presented and often dis- 
couraged because they work and play constantly with others of 
very poor skills. Those with high general ability may be selected 
for leaders classes, given additional skills or projects on which 
to work, placed in advanced classes, or placed in special classes 
with others of like ability. The latter procedure is very frequently 
followed in college courses and these groups make phenomenal 
progress in most activities. 

In short, the students of low ability can be taught to work hard, 
to achieve modest results and to work objectively on shortcomings, 
and to understand that if certain skills are to be passed or standards 
met, that they must spend more time and effort than others. For 
example, beginning swimmers who are afraid of the water would 



Measurement of General Motor Ability i 53 

be perfectly willing to concede that more effort is necessary to 
learn to swim than for those who are not afraid. Likewise, the 
persons who have little apparent aptitude for complex motor skills 
are just as aware of their difficulty as the frightened swimmer. If 
they are motivated to achieve some degree of skill, they will 
recognize the need for practice and be willing to exert the neces- 
sary effort. 

In a similar manner, the student of high ability can be inspired 
by the variety of skills which may be acquired or the high level of 
skill which may be achieved. In both cases the students are aided 
in setting up their own goals, their effort is directed always toward 
improvement, as well as in surpassing some one else. 

BIBLIOGRAPHY 

i. Alden, Florence, Horton, Margery, and Caldwell, Grace: “A Motor Ability Test 
for University Women for the Classification of Entering Students into Homo- 
geneous Units,” Research Quarterly, 3, March 193s, p. 85 

а. Hatlestad, Lucille: "Motor Educability Tests for Women College Students,” 
Research Quarterly, 13, January 1942, p. 10 

3. Johnson, Granville: "Physical Skill Tests for Sectioning Classes into Homo- 
geneous Units,” Research Quarterly, 3, March 1932, p. 128 

4. Kistler, J. W.: "The Homogeneous Grouping of Junior and Senior High School 
Boys for Physical Education Class Activities,” Research Quarterly, 8, December 
1937. P- 11 

5. McCloy, C. H.: "The Measurement of General Motor Capacity and General 
Motor Ability," Research Quarterly, 5, Supplement March 1934. p. 46 

б. : “Recent Studies in the Sargent Jump,” Research Quarterly, 3, May 1932, 

P- *35 

7. : “An Analytical Study of the Stunt Type Test as a Measure of Motor 

Educability,” Research Quarterly, 8, October 1937, p. 46 

8. Metheney, Eleanor: “Studies of the Johnson Test as a Test of Motor Educa- 
bility,” Research Quarterly, 9, December 1^38, p. 105 

9. Niehaus, Marian: A Study of Tests for Dividing Junior High School Girls into 
Homogenous Groups for Physical Education. Unpublished M.A. thesis. Uni- 
versity of Iowa, 1935 

10. Powell, Elizabeth, and Howe, Eugene C.: "Motor Ability Tests for High School 
Girls,” Research Quarterly, 10, December 1939, p- 81 

11. Salit, Elizabeth Powell: Development of Fundamental Sport Skills in Freshman 
College Women of Low Motor Ability, Research Quarterly, 15, December >944> 
p. 330 

12. Scott, M. Gladys: "The Assessment of Motor Ability of College Women, 
Research Quarterly, 10, October 1939, p. 63 

13. : "Motor Ability Tests for College Women,” Research Quarterly, 14, 

December 1943, p, 402 



7 - 


Achievement Ratings and 
Progressions 


There are some activities which do not lend them- 
selves to objective testing. In spite of that fact it may be desirable 
to know the relative status of members of a class, or to motivate 
effort by noting rate of improvement. This can be made possible 
by subjective, systematic ratings of the performance of each per- 
son. These subjective ratings may serve as a substitute for tests. 
Dancing or diving are examples where subjective ratings only 
might be made. In other cases the ratings may serve as a supple- 
ment to the tests. An example of this combination might be found 
in tennis. Tennis tests may show accuracy of placement but usually 
fail to discriminate between strokes varying in speed, flight, bounce, 
and other factors influencing the ease with which an opponent 
might return the ball. A test of accuracy combined with a rating 
of form would be much more valuable than the test alone. 

Other activities are of a self-testing type, such as swimming or 
bowling. If there are many separate units or skills constituting the 
activity, then some kind of progression scale serves to indicate 
desirable sequence of learning and to afford discrimination in the 
rate at which achievement proceeds. Ratings and achievement 
progressions have certain elements in common. However, they will 
be discussed separately, scales and charts provided in those activi- 
ties where they will be most useful. 

RATINGS 


Certain preliminary preparations must be made be- 
fore ratings can proceed. These are inter-related. 

(1) The scale or range must be determined. If the activity is 
not easily judged for differences in efficiency or ease, then a “pass” 

*54 



Achievement Ratings and Progressions igg 

or “fail” category may be sufficient. Usually better results will 
be obtained if at least three categories are used; i.e., the “fail” 
is retained, the “pass” is subdivided into fair and good. The words, 
which are descriptive, may be used or numerical values may be 
substituted, as fail = o; fair = 1; good = 2. Or, if no failures 
are to be anticipated the scale may read, poor = 1; fair = 2; 
good = 3. Then by adding the rating given on two or more items 
or skills, a composite score is obtained, with the highest number 
indicating greatest ability. The scale or range may be increased 
to five points but it is seldom advisable to set up more than five 
categories because discrimination between like cases becomes too 
difficult and too time consuming. 

(2) Each point on the scale must be clearly defined. This defi- 
nition must be in terms of the particular activity and describe 
a level of ability which will be found in the class. For example, in 
rating a swimming stroke: 

3 —g°°d, position good, coordination and timing good, uses 
drive well, and keeps resistance at a minimum; easy, 
relaxed stroke. 

2 —fair, position reasonably good, coordination correct but 
lacks proper rhythm and force. 

1 —poor, stroke is recognizable but position is poor or incon- 
sistent, too much effort with little result, lacks ease 
and relaxation, can swim width of pool. 

o— unable to swim across the pool maintaining proper floating 
position and using stroke throughout. 

If more categories are added then the analysis of coordination 
and timing is carried further with certain types of errors holding 
the swimmer down on the scale in contrast to other errors which 
might be considered less serious. 

(3) The opportunity for rating must be planned. Ratings 
should not be made from memory of what the performer does; 
but rather the performer should be seen in action. In case of swim- 
ming it might mean watching the swimmers in small groups or 
individually while they try to swim a specified distance. In case 
of tennis it might mean watching the player in a game on the 
court, or stroking against a backboard. Rating should then be 



156 Better Teaching Through Testing 

done as objectively as possible, judging the performance and for- 
getting the personality and previous impressions which that person 
has made. 

(4) Score sheets should be planned for greatest ease in rating 
and totalling ratings. Plans should also be made for the student to 
see his ratings, especially if opportunity is to be given for further 
practice on those skills. 

(5) If names of the performers are not thoroughly familiar to 
the teacher or the person making the rating, a number should be 
pinned on each performer in order that the rating may be done 
accurately and promptly. 

RATING OF DIVING 

Diving is judged subjectively whether in competi- 
tive meets, in trying for swimming and diving awards, or in regu- 
lar class instruction and rating. This subjective rating is made 
reasonably accurate by setting up the important elements of the 
dive and a scale which determines the proper classification for a 
performance with various combinations of these elements. The 
following discussion illustrates this procedure for diving in gen- 
eral. In some cases of special dives additional specific points may 
be added. 

ELEMENTS OF GOOD FORM IN DIVING 

1. The position of readiness is erect and well balanced. 

2. The approach is legal (at least three steps in case of a run- 
ning dive). 

3. The approach is smooth, easy, and in good posture. 

4. The hurdle and take-off are timed with the board. 

5. The body is straight, with effective arm and leg action at 
the take-off. 

6. The height of the flight is sufficient to permit the neces- 
sary movements or position. 

7. Body position during flight conforms to specification of the 
dive (i.e., tuck, pike, lay-out, twist, or number of revolu- 
tions). 



Achievement Ratings and Progressions 157 

8 . All body movements during flight are smooth, easy, and 
limited to those which are essential. 

9. The entry into the water is with body straight, close to the 
line of the take-off. 

For regular class use a scale of five points is recommended. It 
should be arranged as follows: 

excellent, dive meets all specifications for good form, no 
apparent errors which call for further coaching. 

4 —good, dive gives a general impression of good form, minor 
variations exist which would improve the dive if cor- 
rected. 

3 —average, dive meets the basic specifications but lacks smooth- 
ness and ease, or lacks control in some one respect 
which affects the dive as a whole. (For example, too 
much forward lean on the take-off decreases height 
and moves the point of entry farther from the take-off.) 

2— fair, dive is inadequate and full of errors, but has some 
indication of control, or merit m some aspect. 

1 —poor, dive is recognizable but fails to meet the standard in 
practically every element involved. 

Competitive diving rules usually require that dives be judged 
on a ten point scale. Such a scale merely differentiates within 
each of these five categories and gives corresponding points. The 
above scale may be broken down as follows: 

5 = 8l /2> 9> 9 ! /2> or 10 

4 = 6^, 7, 7}4 , or 8 

3 = 4 Vi ’ 5. 5 ! /2- or 6 

2 = 2 Vi , 3> 3!4> or 4 

1 = i/ z , 1, i]/ 2 , or 2 

However, in judging competitive diving the dive is automatically 
scored as zero if it is not the particular variation of the dive which 
is entered or announced. Competitive diving further varies the 
points awarded according to the difficulty of the dive. This has 
nothing to do with the judging. The dive is usually given the 
average of the ratings of three judges each working independently. 
Let us assume three different dives each of which received an 



158 Better Teaching Through Testing 

average rating of 8 from the three judges. The rules may specify 
that one dive is classed as 1 in difficulty, another is more difficult 
and classed as 2, the third is still more difficult and classed as 3. 
The number of points won by these three dives is the judges’ 
rating, 8, multiplied by the difficulty value 1, 2, or 3. Therefore, 
the dives win points 8, 16, and 24 respectively. Usually this pro- 
cedure is not followed in class work where students are apt to be 
working on dives similar in difficulty. 

The judges will have the most complete view of the dive if one 
stands at the side approximately in line with the end of the board, 
a second on the opposite side a little beyond the end of the board, 
and the third stands back near the rear end of the board. 

It is understood in competitive swimming that diving judges 
must be highly qualified, that the same judges rate all divers, and 
that they each follow the exact specification and scale prescribed 
for the dives. Under these circumstances, the ratings by the vari- 
ous judges will be quite consistent and a satisfactory means of 
evaluating the performance. Exactly the same procedure is neces- 
sary as a part of a swimming test. (See Chapter 9 for a discussion 
of ratings by experts.) 

RATING OF POSTURE 

Evaluation and rating of posture has never been 
thoroughly satisfactory. One method that is used is to ask the per- 
son to stand in either a good or habitual position and have one or 
more persons rate that position. This procedure is made slightly 
more objective by the use of plumb lines or vertical lines in the 
background and by indication of certain landmarks. A very valid 
objection to this procedure is that one is moving much more of the 
time than standing, and that ability to stand well does not always 
insure ability to move well. 

Another alternative for the same type of posture rating is to take 
a posture picture or silhouette and make a rating of the picture. 
This has certain advantages, namely, the record is permanent and 
better comparison of similar figures and more uniformity of rating 
will be obtained, (2) the picture can be taken at successive times 



Achievement Ratings and Progressions 159 

and direct comparisons made, (3) the picture provides the student 
with an opportunity to see how he looks. The disadvantages are 
that it is a stationary pose, that there is still no objective criterion 
or measure to be employed, and it is expensive and impractical to 
use in some cases. 

The main objection to these two methods is partially overcome 
by asking the person to perform specific movements and the rating 
is made of that performance. This may be done in succession in a 
class or may be done informally by watching the person in daily 
activity doing such things as walking, sitting, going up or down 
stairs, or carrying books or other loads. The latter has the advan- 
tage of seeing the person naturally in activities and emphasizing 
the need for good posture at all times; but has the disadvantage of 
being time-consuming for the rater and difficult to see all persons 
in a sufficient number of situations. 

One solution to most of these drawbacks is suggested in the plan 
for class testing discussed below. This is a plan for the organiza- 
tion of the class and selection of activities in which posture is to 
be rated. Whether standing position only is to be rated, or simple 
and complex movements are to be included, the outline of ele- 
ments in posture must be available for the rater. Therefore, let us 
consider such an outline first. 

ELEMENTS IN GOOD STANDING POSTURE 

1. Back and Head 

1. The three curves of the spine should be moderate. 

2. The head should appear well balanced on top of the 
spine with the line of vision and the chin horizontal. 

3. The trunk should appear easily erect without being 
stiff. 

4. The outline of the sternum and ribs should be more 
or less straight in front, with a long vertical axis, 
rather than sunken, collapsed, and with a concave 
axis. 

5. The spine should appear straight when viewed from the 
rear. 



160 Better Teaching Through Testing 

g . Abdomen and Pelvis 

i. The line of the abdomen should be straight or very 
mildly convex. (There is an exception to this in the 
small child up to about six years of age. The ab- 
dominal line is usually convex but the following 
point still applies.) 

g. The abdominal wall should be mildly firm, not re- 
laxed and sagging, and not stretched or containing 
excessive fat deposits. 

3. The pelvis should be held squarely beneath the trunk, 
not with the lower back and abdominal wall project- 
ing forward at a pronounced angle.* 



Figure 30. Illustration of Pelvic Alignment 

4. The lower line of the abdominal wall should be be- 
hind the lower end of the sternum. 

3. Shoulder Girdle and Arms 

1. The shoulder blades should not deviate appreciably 
from the contour of the spine and thorax. When 

• The pelvis may be considered as a box, even though it is distinctly irregular 
in shape. (See Figure go.) That box should be kept squarely aligned as though 
resting on a level surface and not tipped up on edge. Tilting the pelvis invariably 
causes the buttocks to project backward more prominently just as it causes increased 
concavity of the lumbar spine. However, care must be taken to judge pelvic align- 
ment by the position of the pelvis itself and not the spine and abdomen above it. 
The contour of the buttocks is not an accurate means of judging alignment since 
there is great variation in the muscular development and hence of the outline of the 
hip. Figure go illustrates the same girl with the pelvis in the two. positions. 



Achievement Ratings and Progressions 161 

viewed from the side as in a posture picture they 
should not markedly exaggerate the convex line of 
the upper back. 

2. The shoulder girdle should be carried far enough 
back that the arms hang easily at the sides. 

3. The shoulder girdle should be retracted without giv- 
ing the appearance of stiffness and without thrusting 
the ribs and sternum forward unduly. 

4. The shoulders should be low, not shrugged or tense. 

4. Feet and Legs 

1. The feet should be parallel. 

2 . The inner line of the feet should be straight rather 
than convex (not pronated). 

3. The heel cord in the rear should be straight, not 
turned in at ankle level. 

4. The upper surface of the feet should appear straight 
or convex, not sunken and spread just back of the 
toes. 

5. The knees should be straight without rigidity, not 
flexed. 

5. General Alignment and Weight Control 

i . The following landmarks should be situated one above 
the other when the person is viewed from the side: 
lobe of the ear, point of the shoulder, hip joint, rear 
of the patella. 

2. The line through these four points should be vertical 
and extend downward through the feet midway of the 
base (heel to ball of foot) which brings it a little in 
front of the ankle joint. 

3. General appearance is of relaxation and control rather 
than rigidity. 

When the rating is made during movement most of the same 
points still apply and the following are added. 



162 Better Teaching Through Testing 

WALKING 

1. Contact for each step should be made first with the heel. 

2. Push-off at each step should come from the toes, principally 
the great toe. Failure to get a drive from the toe gives an 
appearance of rocking over the foot and then simply lifting 
it without noticeable ankle and foot action. 

3. Leg action should be free, without tenseness and without 
conspicuous swaying of the hips in either a lateral or vertical 
direction. 

4. Arms and shoulders should be relaxed but controlled within 
a relatively small arc. 

5. Balance should be maintained over the base without stiff- 
ness, or flexion at the hips or hyper-extension in the lower 
back. 

6. There should be no jar to the body at the moment of heel 
contact. 

RUNNING 

1. Contact should be made first with the ball of the foot. 

2. The push-off from the toes should be strong. 

3. Vertical bobbing should be eliminated by knee action. 

4. Excessive forward reach by the legs should be eliminated by 
bringing the leg downward and backward for the step rather 
than stretching it out for a long stride. 

5. Body lean should be greater with increased speed, but should 
result from the body inclining forward as a whole, not by 
trunk or hip flexion. 

MISCELLANEOUS ACTIVITIES 

1. The body, including weights to be lifted or carried, should 
always be in optimum position of balance, resulting in con- 
servation of muscular effort and strain. 

2. Excessive range of movements and superfluous movements 
are always undesirable. 

3. Relaxation should be as complete as is possible for good 
body mechanics in the task involved. 



Achievement Ratings and Progressions 163 


A very practical and useful diagnostic test might be organized 
as in the following chart which would take care of a squad of 10 
girls. That number can be scored very conveniently at one time.* 



Figure )t. Illustration of Arrangement for Administering Posture Test 

A-B Area in which examiner moves to observe each student. The space between 
A-B and chairs is tor students to perform activities for the rating 


A three point scale could be used; 3— good, 2— fair, 1— poor. 
Standing and walking posture would doubtless be included. Use 
of the feet in walking is most easily judged separately from the 
general rating of walking. The additional items to be included in 



164 Better Teaching Through Testing 

the test would probably be chosen from activities such as running, 
stair climbing, sitting, stooping, reaching overhead, carrying a 
load, pushing or pulling. 

In preparing for the test place ten chairs in line, one in front 
of the other, with a little space between. There should be some 
open floor space beside them. Names of squad members are entered 
on the score sheet; when the test is given students are seated in 
the same order as in the list and they remain in the same order 
throughout the test. Adequate rating can be given only if the 
students are dressed in swimming suits or tight fitting suits, and 
are barefooted. 

The examiner stands to one side of the row of chairs.* 

1. Each girl in turn walks a fe%v steps toward the examiner, 
turns and walks away again. This gives opportunity for judging 
the foot alignment and pronation. 

2. Each girl in turn walks a few steps forward (with side to the 
examiner). This gives opportunity to examine for heel contact, 
weight transfer and toe drive. 

3. Each girl stands in line while the examiner moves down the 
line rating standing posture. 

4. The girls walk two or three at a time back and forth beside 
the line of chairs. During this the examiner rates the walking 
posture. Having more than one walk at a time helps to avoid self- 
consciousness and unnatural gait. 

5. Girls sit in the chair in a natural sitting position for rating. 
Each rises and then sits again to be judged on balance and move- 
ments. 

6. Movement on stairs should be both up and down. The test 
may be given on real stairs, preferably wide ones for a better view 
and to accommodate two girls at a time; or it may be given on 
stairs constructed for this use in the gymnasium. 

• The test and chart represent modifications of a test developed by the Staff of the 
Department of Physical Education for Women, University of Iowa. They are simi- 
lar to forms' published by that department, to one appearing in Posture and Body 
Mechanics by Loraine Frost, University of Iowa Extension Bulletin; and to another 
in Tests and Measurements in Health and Physical Education by C. H. McCloy, 
F. S. Crofts and Company. Quoted by permission of each of the above publishers. 



Heel to Alignment Align- Alignment Align- 
Feet toe con- & relax- ment & and coor- ment 


165 



Figure 32. Sample of Posture Test Score Sheet 
















166 Better Teaching Through Testing 

Other items in the test can be set up in a similar manner prefer- 
ably with some properties or setting to make the movements seem 
natural. 


RATINGS IN DANCE 

Ratings represent about the only approach to 
measurement in dance, and are used for evaluating ability in the 
various types of dance. They can be made on dance steps, such 
as the waltz or schottische, or on the performance of an entire 
dance, as might frequently be the case in square dances, for ex- 
ample. It is better to have just a small group to be watched at 
one time; to alleviate embarrassment, it may be better to have 
several persons dancing but only a few being judged. 

The rating scale on a dance might read as follows: 

5 —excellent, is skillful in steps involved; knows positions and 
floor pattern, and sequence of steps; shows a feeling 
of assurance and of enjoyment, and expresses the spirit 
of the dance. 

4 —good, executes the steps correctly in form, rhythm and se- 
quence, but is lacking something in ease, naturalness, 
or expression. 

3 —average, performs the dance correctly with only minor errors 
which she is able to correct herself or by cues from 
other dancers, apparently knows it well enough to 
enjoy it. 

2 —fair, can execute the basic steps involved and can perform 
the dance reasonably well by the lead of partner or 
other dancers; rhythmically she makes an occasional 
error and is unable to readjust readily. She may or 
may not enjoy it depending upon whether she is dis- 
turbed by her own errors. 

i poof, performs dance steps poorly; she is almost entirely de- 
pendent upon others for cues for sequence; rhythmi- 
cally she is inaccurate and apparently unaware of it. 



167 


Achievement Ratings and Progressions 
RATING OF SOFTBALL BATTING FORM 

The construction of objective skill tests in batting 
has been made more difficult by the lack of a mechanical device 
which would deliver the ball in the same manner to all batters. 
The ability of the pitcher is known to affect the ability of the 
batter. Since batting comprises the major part of the offensive 
action in a game, it is important that it be measured, and in as 
nearly an objective manner as possible. Having the same pitcher 
deliver the ball to all batters being tested has been tried; usually 
the number of balls that must be delivered is too great for the 
pitcher’s endurance. Batting is a skill in which the element of 
chance plays a large part, and therefore the number of pitches to 
each batter has to be quite great. Such tests are time consuming and 
take considerable space. The use of a rating form during actual 
playing time seems to be the best substitute at present. 

The rating form can be mimeographed, with one form cumula- 
tive in nature, for each player. For use in instruction, these forms 
should be ready for use the early part of the season. They can be 
handed to the player after each turn at bat. The same form can be 
used in a few tournament games at the end of the season, with the 
tally marks being recorded in a different color on successive ratings 
so that improvement can be noted. The ratings will be fairly re- 
liable if each batter is rated in three games at the start of the season 
and two or three at the end. Student leaders can be taught to do 
the rating; it is important that they understand the fundamentals 
of good batting form. 

The rater should stand behind the plate umpire and slightly to- 
ward first base, in foul territory. The use of several raters is 
preferable. (See Ratings, p. 232.) 

The detailed form presented here does not include the result of 
each batter’s turn at bat. Such a record can be obtained from the 
score book or seasonal batting averages, and should not unduly 
influence the rating of form. If time is limited the checking of 
errors can be omitted. Their inclusion is mainly to facilitate 
using the blank as a teaching device. 



i68 


Better Teaching Through Testing 


Date 


Rater's 

Initials Instructions: 

Player’s name: 

l. Fill in date and 

your initials on Captain's name: 

each form. 

Class Hr., Days: 

_____ s. Rate the player 

each time she bats. Place a tally mark in the space 
which precedes the best description of player’s form 
(good, fair, poor) in each of the six categories. 

3. Indicate your observation of errors on the right hand 

" half of the page, again with a tally mark. 

4. Write in any additional errors, and add comments 
below. 


RATING 

1. Grip 

Good 

Fair 

Poor 

2. Preliminary 
Stance 

Good 

Fair 

Poor 

3. Stride or 
Footwork 

Good 

Fair 

Poor 

4. Pivot or 
Body Twist 

Good 

Fair 

Poor 


ERRORS 

.Hands loo far apart. Wrong hand on top. 

.Hands too far from end of bat. 


In relation to plate, stands: 

too near it. Rear foot closer to plate than 

forward foot. 

too far from it. 

too far forward toward pitcher. 

too far backward toward catcher. 

bat resting on shoulders. Shoulders not hori- 
zontal. 


.Fails to step forward. 

Tails to transfer weight. 

JLifts back foot from ground. 


Tails to twist body. 

Tails to "wind up.” 

.Has less than 90° of pivot. 


5. Arm Move- 
ment or 
Swinc 

Good 

Fair 

Poor 


.Arms are held too close to body. 

.Rear elbow held too far up. 

■Bat not held approximately parallel to ground. 
.Batter does not use enough wrist motion. 
.Wrists are not uncocked forcefully enough. 


6 . General 

(Eyes on ball, judgment of 
pitches, etc.) 

Good 

Fair 

P oor 


.Batter’s movements are jerky. 
Batter tries too hard; ’’presses.” 
Tails to look at exact center of ball. 
.Poor judgment of pitches. 

Batter appears to lack confidence. 
.Poor selection of bat. 


ADDITIONAL COMMENTS: 


Figure 33. Sample Rating Sheet for Softball Batting Form 



Achievement Ratings and Progressions 169 

BASKETBALL RATINGS 

It is sometimes desirable to rate basketball players 
in order to get estimates of skill or improvement in some partic ular 
phase of the game. This may be done by watching the players in 
the game, and judging them on a three or five point scale similar 
to other scales illustrated in this chapter. Many of the basketball 
tests in use involve basket shooting because they give objective 
scores. Other skills do not lend themselves so readily to objective 
testing. 

In the women’s game, it has been found by observation and 
some experimentation that skill in use of the bounce is very closely 
related to general skill in the game. Such a rating is particularly 
good to help in classifying players for beginning or advanced 
groups, for classes, or placement on teams. This can be rated most 
readily by having two or three players working together on one half 
of a basketball court. They are asked to pass the ball from one to 
another, to cover the floor as though working through a defense. 
Each is also asked to use a bounce to herself as frequently as pos- 
sible. The bounce may be judged as follows: 

3 —good, keeps ball under control at all times; covers distance, 
keeps ball low, avoids all fumbling and traveling. 

2 —fair, uses the bounce to advantage; but does not cover as 
much distance as might be desirable, or bounces at a 
height which would be intercepted easily, or occa- 
sionally travels. 

1 —poor, gets no advantage from the bounce, and would prob- 
ably lose the ball by interception or traveling in the 
game, bounces straight down to the floor, or travels con- 
sistently. 

General footwork may be judged in the same way in small groups 
of two or three. The additional points to be considered would be 
ability to start, run, stop, or change direction quickly by means of 
a reverse turn or pivot; to judge speed and position with respect 
to space on the floor and action of teammates’ play; to avoid travel- 
ing when in possession of the ball, but to be able to keep the play 



170 Better Teaching Through Testing 

moving. The main justification for this type of rating is (1) that 
it is very difficult to evaluate in actual tests, and (2) that it ap- 
proaches the game situation more adequately than many tests. The 
player’s rating in this case is much more apt to be affected by the 
ability of the partner being tested than is true in rating the bounce. 
For that reason it may be desirable sometimes to change pairings 
for this rating. 

RA TINGS IN TENNIS 

Accuracy of placement of the serve can be tested ob- 
jectively; likewise, ability to keep a ball in play. (See skill test, p. 
100.) Form must be rated subjectively. This can be done while the 
players are playing regular games, using the scale below as a guide. 
It is presumed that the students are divided into two ability group- 
ings for instructional purposes and that the rater will have had 
contact with the group over a period of time. 

1 . For Beginners 

5 —good, executes all strokes in good form. May have 
played before this term, has learned rapidly, per- 
haps worked outside of class or watched others 
play. Profits by all suggestions. 

4 — above average, plays the game sufficiently well to avoid 

being conspicuous on the courts for poor playing. 
Has shown definite improvement and is anxious 
to learn. 

3 —average, shows fair but somewhat inconsistent form; 

knows the essentials of the game, scoring, etc. 
z—near dub, can stroke in fair form but is careless; has 
improved some during the term but has little 
knowledge of the game. 

1 —dub, has poor strokes and has made little progress. 
Makes little effort to improve. 

2. For Intermediates 

5 — expert, has good form and plays consistently well, is 

fast, knows the game and uses excellent strategy, 



Achievement Ratings and Progressions 171 

and can play either singles or doubles reasonably 
well. This does not imply that the player is an 
experienced tournament player. 

4 riear expert , usually has good form and plays smart 
tennis; perhaps plays singles better than doubles, 
or vice versa; is somewhat erractic but knows the 
game. 

3 —average, usually plays a good game, tries out new tactics, 
and is analytical concerning own game. The 
greatest need is practice. 

st— fair, is weak on some techniques but has ability to im- 
improve, understands weaknesses, and has knowl- 
edge of game proc edure. 

1 —poor, has not overcome poor habits of technique, has 
little knowledge of the game or analytical ability. 

The above scale is general in nature and can be used by various 
raters even though they disagree about certain points on form. An- 
other method of rating which is more specific but is limited to the 
ability to execute strokes is described below. Its chief advantage 
is that it is more economical of time then the first method, which 
measures the ability of the player to use his skills and knowledge 
in a game situation. 

Divide the players into small groups for rating the serve and 
supply each with several balls. Line them up along the baselines, 
all on the same side of the net. (For indoor rating, place the base- 
line at official distance from the net 01 backboard allowing about 
five to ten feet between players.) As many as ten can be judged at 
the same time. Their names should be listed on the score sheet in 
the same order as the arrangement of players. The number of repe- 
titions necessary before the raters make final judgment will depend 
largely on the experience of the judges; but the entire process, rat- 
ing the three basic strokes, should not take more than sixty activity 
minutes for a class of forty players. 

For the forehand and backhand, other players are needed for 
putting the ball in play. The class can be divided into three groups, 
with the throwers stationed near the net, the strokers behind the 
baseline on the opposite side of the net from the throwers, and 



178 Better Teaching Through Testing 

the ball chasers behind the throwers. When this rating is con- 
ducted indoors, care must be taken to protect the ball throwers 
from possible injury. The ball chasers can be eliminated and the 
number of strokers may need to be reduced. The ball throwers can 
be asked to kneel on the floor behind the net. Since the ability of 
the ball throwers affects the ratings, a rotation plan should be used, 
to insure that no one player be handicapped throughout the test 
by having to work with a poor thrower. 

When the game is subdivided, with each stroke being rated 
separately, a chart similar to that in Figure 34 can be used. It can 
be extended to include the volley and lob. Rate on a three or five 
point scale. 


Name 

Forehand 

Backhand 

Serve 


































Figure 34. Sample Chart for Tennis Ratings 


ACHIEVEMENT PROGRESSIONS 

Some activities build naturally from one skill or 
ability to another. Usually such classes are organized so that dif- 
ferent members may work more or less at their own rate. Items are 
then checked off as they are accomplished and the student knows 
that he is then ready to advance to other or more difficult items. 
This is true in both swimming and tumbling. 

Proper use of such a chart requires that the student know im- 
mediately the results of his effort. Also, opportunity should be 
given to study the chart, to know what to start on next, to know 
what should be accomplished eventually, and to set own goal or 
rate of work. Such a procedure can be a powerful motivator. 




















174 Better Teaching Through Testing 

A progression chart naturally reflects the teaching plans; i.e., the 
level of skill expected from the group, and the steps by which 
the skills are to be achieved. 

Charts of the type illustrated here can usually be used by mem- 
bers of the class on each other, or by squad leaders. Simply specify 
the standard which constitutes success. If more discrimination is 
needed and adequately trained assistants (squad leaders) are avail- 
able, it is no more time consuming to rate as fair and good, i.e., 
1 or 8. 

Examples from different activities follow which illustrate the 
points already discussed. 

SWIMMING 


The achievement chart for elementary swimming is 
for a series of eighteen to twenty-four lessons for a beginning class 
being taught swimming as a safety and recreational skill. Some 
members of the class may not finish the last three or four items. 
However, items of that difficulty are essential if those who learn 
more rapidly are to be stimulated to real effort. If members of 
the class start with some ability, then obviously each should finish 
those which are starred, since these represent merely progressions 
into something else. 

The standards for passing each item might be similar to those 
fairly detailed chart and if time is limited omit items such as 
the elementary chart and start on an intermediate one. This is a 
below the chart and are simple enough to be judged by class mem- 
bers. 


STUNTS AND TUMBLING 

The tumbling stunts are divided into groups accord- 
ing to the principal skills involved. The progression chart includes 
stunts of increasing difficulty within each group. The various groups 
are usually carried along simultaneously, with students making 
more rapid progress in one group of stunts than in others. This 
chart is simply illustrative of method and is not meant to indicate 
the events to be taught. 



flexibility balance | agility j strength ) co-ordination 



Figure j6. Sample Achievement Chart for Stunts and Tumbling 




176 Better Teaching Through Testing 

The stunts may be checked off when achieved, or ability may be 
evaluated by a two or three point rating scale. The first procedure 
is probably best if the students check each other, or if the squad 
leaders do the checking. The rating scale is preferable if the in- 
structor is doing the rating. It is possible to weight the stunt for 
difficulty as in the case of the diving. This is good if conducting 
competition between individuals or squads, but is not essential for 
regular class work. 

BOWLING 


The bowling score in itself is indicative of relative 
ability of players. However, it does not analyze strong and weak 
points in one’s game. To be of greatest value in teaching, a record 
should be kept of some of the set-ups which occur frequently in the 
game. The number to be included in the chart will depend upon 
the time allotment. This is an excellent practice device. 

If practice is exclusively on regular games the only set-up which 
they face with any consistency is the full ten pin arrangement. The 
percentage of strikes or spares can be kept on total frames rolled. 
However, a chart of this type gives the score on special set-ups for 
the second ball or spare attempt. 

An individual score sheet with a continuous record is preferable. 
That is the form indicated here. This makes an excellent teaching 
device and gives a diagnostic record for both teacher and student. 

ARCHERY 


Archery is a self-testing activity and it is scored in 
such a way that every arrow shot is rated for its relative accuracy. 
Scoring also permits a partial analysis of performance, i.e., whether 
the shooting is consistent or variable, whether one distance is more 
difficult than a shorter or longer one. The officially recognized 
rounds provide exactly the same kind of a score as that obtained 
from tests in other sports. Other rounds or series of scores may be 
used in the same way. The scores on a given round may be used as a 
measure of performance, or scores on the same round may be com- 
pared at intervals to measure improvement. For the college student 




Figure 37. Sample Chart for Bowling Practice 






178 Better Teaching Through Testing 

Hyde presents standards of performance on the first and last Co- 
lumbia Round, and therefore suggests the amount of improvement 
which may be expected from students." 

BIBLIOGRAPHY 

1. Alexander, John: "A Motivating Individual Record,” Journal of Health and 
Physical Education, 10, December 1939, p. 58a 

2. Bennett, LeVerne Means: "A Test of Diving for Use in Beginning Classes," 
Research Quarterly, 13, March 194*, p. 109 

3. Buhl, Olga Anderson, and Morrill, Warren P.: "The Measurement of Postures,” 
Research Quarterly, 12, October 1941, p. 518 

4. Crabtree, Helen Kitner: “A Test for Riding,” Journal of Health and Physical 
Education, 14, October 1943, p. 419 

5. Hyde, Edith I.: “An Achievement Scale in Archery,” Research Quarterly, 8, May 
1937, p. 109 

6. Frost, Loraine: “Posture and Body Mechanics,” University of Iowa Extension 
Bulletin, No. 4J9, March 1940 

7. Judd, Mary: A Study of the Distribution of Weight on the Foot Walking with 
and Without Shoes. Unpublished M.A. thesis. University of Iowa, 1943 

8. Streit, W. R.: "A Stunt Meet for Elementary School Boys,” Journal of Health 
and Physical Education, 10, December 1939, p. 584 

9. Wickens, J. Stuart, and Kiphuth, Oscar W.: "Common Postural Defects of Col- 
lege Freshmen,” Research Quarterly, 13, March 1942, p. 102 



8 . 


Construction of Knowledge 
Examinations 


Written examinations have not been as widely used 
in physical education as have some other types of tests, although 
the teachers who have made use of them have found them to be 
very helpful. Certainly all of us agree that one of our objectives is 
the acquiring of knowledge and most of us agree that a grade in a 
course should reflect the degree to which the student has progressed 
in all of the objectives. We are not content to base grades on at- 
tendance records alone; we should not base them only on improve- 
ment in skill. Practically all of our courses have a knowledge 
content, although the amount varies. 

AVAILABILITY 

Very few knowledge tests in physical education have 
been published. None of the specially organized test building 
agencies, such as the Cooperative Test Service and the various state 
and regional testing bureaus, has prepared tests in physical educa- 
tion. When the demand for such tests is sufficient, doubtless the 
test building agencies will undertake to meet it. Until such tests are 
available, teachers must construct their own and use as guides the 
few good ones that have been published. 

USES 

As pointed out in Chapter 1, written examinations 
serve many purposes other than as a partial basis for assigning 
marks. The teacher can use them to discover what the student 
knows at the outset, and therefore upon what level to begin instruc- 
tion. For example, the teacher of a rhythm course may want to 
know how much the students know about music. 

*79 



180 Better Teaching Through Testing 

Another very important use is as a teaching device. Short quizzes 
can be used to stress certain points, or to aid students in sum- 
marizing knowledge of units within a course. Some teachers make 
good use of them in teaching activities where the rules are some- 
what complex, as for example, basketball, field hockey, and soft- 
ball. 

One of the newer and most important uses of written tests is 
that made by some of our teacher training departments who are 
attempting to determine the needs of their entering students. They 
recognize that the backgrounds of these entering students vary 
greatly and that it is not wise to require the same courses of all. 
For example, if the knowledge and skill of an entering student is 
beyond that of the “average” student who has completed the basic 
elementary course in any activity, then that student is permitted 
some form of election. The written test is used in connection with 
skill tests, and other information that the advisor has, to determine 
the requirements. It is obvious that knowledge tests that are used 
for this purpose must be very carefully constructed. Such tests 
usually require more time and effort than the average teacher can 
give, as the various questions need to be analyzed for their efficiency 
and the tests rebuilt on the basis of earlier results. 

SOURCES OF INFORMATION 

For the teacher who must prepare his own examina- 
tions, suggestions are given here. Much has been written on the 
construction of knowledge tests in such subject matter fields as his- 
tory, English, and mathematics. Considerable of that which has 
been written is applicable to tests in our field, and assistance can 
be gained by studying some of the better texts. 4 ’ 8 The discussion 
here will be confined to the field of physical education. 

» 

DISTRIBUTION OF CONTENT 

The first step is to determine the use to be made of 
the examination. If you are going to give the test as a final exami- 
nation in a course or use it for classification, then care must be 
*■ * See bibliography at end of this chapter for these and other references. 



Construction of Knowledge Examinations 181 

taken to make it be comprehensive. The course outline should be 
consulted, and if it is brief, the test constructor will need to elabo- 
rate it, listing all of the important concepts. In general, the larger 
number of questions should be devoted to those things that are 
considered to be of prime importance. Avoid overweighting the 
examination with questions covering just one phase of instruction, 
as for example, rules of an activity. Following the course outline 
rather than the text or rule book will help to avoid this all too 
common failing. 

The teacher who plans to use a ready-made test should consult 
the content distribution to see if it coincides with the emphases 
he has made in his course. An example of an attempt to distribute 
the number of questions according to content is that found in a 
soccer knowledge test, prepared for professional students. 

Number of 
Questions % 

Analysis of individual techniques (how to do in 

good form) 3 5 

Analysis of game situations and use of skills 8 13 

General knowledge (history, selection and care of 
equipment, safety precautions, differences be- 
tween field ball, speedball, and soccer) 3 5 


How to avoid fouling 

3 

5 

Placement of passes, throw-ins, kicks for goal .... 

2 

4 

Tactics and areas of play 

22 

36 

Rules essential to intelligent play 

• i5 

25 

Terminology 

■l 

6 


60 



CHOICE OF TYPE OF ITEM 

After deciding the proportions, the next step is 
selecting the type of item that fits the content best. The types most 
useful far physical education will be discussed and illustrated. 


MULTIPLE CHOICE 
1 . Forms 

There are several forms of multiple choice exercises or items 
in use. 



i8s Better Teaching Through Testing 

Form A: A direct question followed by a number of re- 
responses, only one of which is correct and all others definitely 
incorrect. 

Example, from a field hockey examination: 

What is the umpire’s decision when the 
ball is sent over the endline, last touched 
by a member of the attacking team from 
within the striking circle? 

(1) Long comer. 

(2) Penalty comer. 

(3) Penalty bully. 

(4) Twenty-five yard line bully. 

(5) Free hit. 

Here there is but one correct answer, number 4, and all of the 
others are definitely incorrect. 

Form B: A direct question followed by a number of re- 
sponses, all or some of which are acceptable in various degrees but 
one of which is definitely better than any other. This is known as 
the “best-answer” type. 

Example, from a volley ball test for girls: 

What is the best use that can be made of 
the set-up? 

(1) To place the ball in position for 
a spike. 

(2) To pass the ball to a teammate. 

(3) To remove the spin from the ball 
so that it can be played more ac- 
curately. 

(4) To encourage good teamwork. 
Here the first answer is definitely better than the others, although 
all are acceptable to some degree. 

Form C: An incomplete statement with several possible com- 
pletions provided, one of which is to be selected. 

Example, From a tennis test: 

In the parallel system of court coverage 
in doubles, the server and her partner 
should 

(1) Take positions at the net imme- 



Construction of Knowledge Examinations 183 

diately following the delivery of 
the first serve. 

(2) Remain behind the baseline un- 
til they are able to force the 
opponents into a disadvantageous 
position. 

(3) Go to the net as the ball is being 
returned to their side of the net. 

(4) Assume positions about halfway 
between the baseline and the 
service line, waiting for an op- 
portunity to go to the net. 

Here the answer will depend on what the instructor has taught the 
class, 1 or 2 being considered best, 3 and 4 being definitely incor- 
rect. 

Form D: An identification type of question, with a list or 
key of abbreviations for the choice of answers placed at the top, and 
then a series of questions, with blank spaces provided for answers in 
the left hand column. 

Example, from a basketball test: 

Directions: Select the appropriate answer and 
place the symbol or abbreviation in the blank 
space immediately preceding the question. P, per- 
sonal foul; T, technical foul; TC, technical foul 
charged to the position of captain; V, violation; 
L, legal play. 

. 15. A player along the free throw lane steps 

into the lane as soon as the ball leaves 
the hands of the free thrower. 

16. A player catches a ball while running 

and takes three steps before stopping. 

This type of question has been widely used in the Women’s Na- 
tional Officials Rating Committee examinations. The description 
of the situation should be brief, yet give all the necessary informa- 
tion. All the symbols should be used at least once, and they should 
be repeated at the top of each page to avoid errors or waste of 
time in answering the questions. This is particularly important in 
an activity where the symbols are somewhat confusing, as in volley 



184 Better Teaching Through Testing 

ball, where “S” may be used for "side out” and “SO” for "serve 
over.” 

There are other forms with minor modifications, mostly typo- 
graphical in type. The direct question form, either A or B, is better 
than the incomplete statement (Form C) in that the student knows 
from the outset what problem is being presented and is saved the 
time of rereading the stem in connection with the responses. This is 
particularly true when the sentence is long. Form B is preferable 
to Form A in that it tends to test for deeper knowledge. In Form B, 
the student must read all of the responses and then decide which 
one is best. 

There is no particular advantage in having a fixed number of 
responses. It may make the test look neater but it tends to cause the 
test constructor to throw in some responses that may not function, 
that are merely superfluous. If only three plausible responses can 
be contrived, then use only three. Three is the minimum and 
there is seldom any advantage in having more than five. 

2. Uses 

The multiple choice type of test item seems to fit the content 
of most of the subject matter in physical education and is preferred 
for the following reasons: 

1. They can be adapted to test for any depth of under- 
standing. 

2. They can be made completely objective in scoring and are 
easily adapted to answer sheets. (See answer sheet, p. 46.) 

3. It is possible to detect readily any non-functional material 
in the responses, thus facilitating the revision of the question for 
later use. (Anon-functional response is one which is never selected.) 

4. They test the student’s ability to eliminate incorrect 
responses as well as to select the correct response directly. 

5. They do not require correction for guessing. 

6. They seem to have fewer disadvantages than the other 
commonly used forms: alternate response (true-false, yes-no, etc.), 
matching forms, and recall (completion, analogy, definitions, etc.). 

Use multiple choice when many alternatives are possible, but 
some basis exists for superiority of one. Use it when students can 
be expected to differentiate between closely related points. 



185 


Construction of Knowledge Examinations 

3. Rules for Construction 

1. Use a short, simple, direct question form for the stem. 

2. Avoid choices which are not plausible or which are too 
obvious. 

3. If the directions call for selecting the one correct answer, 
avoid having more than one correct response. (It is preferable to 
state in the directions that the best response is to be sought; that 
there may be more than one but that one will be superior.) 

4. Avoid answering one question by another. 

5. Avoid unintentional clues, such as placement of correct 
or best response consistently in a certain place in the series. An- 
other clue is word matching between the stem and the response. 
Making the correct response consistently longer or shorter than the 
incorrect should be avoided. An example of a grammatical clue is 
the use of a singular expression in the stem and plural ones in all 
but the correct response. Another grammatical clue is the use of 
an incomplete statement in the stem ending in “a” or “an.” (In a 
survey of teacher made examinations, more grammatical clues 
were found in those questions employing an incomplete stem than 
in any others.) 

6. Avoid use of textbook language, if you wish to test for 
ability to use information, for understanding, and not just memo- 
rization. It is considered legitimate to use familiar or stereotyped 
phrasing in an incorrect response occasionally, to deliberately mis- 
lead the shallow thinker. 

4. Negative Approach 

Some times the material may adapt itself to a negative ap- 
proach, although the negative approach is to be avoided if possible. 
It may be used when the test constructor can find many more cor- 
rect responses than incorrect ones. At times it is almost impossible 
to decide what the best answer would be to a positive form of the 
question. If the negative form is to be used for a number of ques- 
tions, it is advisable to segregate them and label them as negative. 
A substitute for this is to italicize or underline the negative por- 
tion of the stem so that the student is sure not to miss seeing it. 

An example of a negative question from a soccer exami- 
nation: 



186 Better Teaching Through Testing 

The following statements concern trapping the ball with 
the sole of the foot. Which one is false? 

(1) Keep your eyes on the ball rather than on an 
opponent. 

(2) Use this trap for balls that are rolling slowly on 
the ground. 

(3) Put your weight on to the foot that is over the ball, 
clamping the ball between the foot and the ground. 

(4) Place the middle of the sole on the ball, lowering 
the heel slightly. 

(5) Get your body in line with the oncoming ball, so 
that you face it squarely as it approaches. 

Here 3 is a false statement and, therefore, the correct response. 

Another example of a negative question is taken from a track 
and field test. 

Which of the following does not disqualify a throw in the 
javelin event? 

(1) Stepping on the throwing line before the javelin 
leaves the hand. 

(2) Stepping over the throwing line before the javelin 
leaves the hand. 

(3) Javelin failing to stick in the ground. 

(4) Two hands used on the javelin during the ap- 
proach. 

The correct answer is 3. 

Still another example of a negative approach, demonstrating 
the use of the word “least” in the stem, is taken from a body me- 
chanics knowledge test. 

Which of the following exercises would you expect to be 
of least value in minimizing painful menstruation? 

(1) Airplane exercise. 

(2) Bicycle exercise. 

(3) Double knee circling exercise. 

(4) Prone lying, head and shoulder raising exercise. 

(5) Knee-chest exercise. 

The answer is 4. 



Construction of Knowledge Examinations 


187 


5. Scoring 

The number of right responses is the score. This can be 
quickly obtained when using a superimposed key by counting the 
errors and subtracting that sum from the total number of items. 

ALTERNATE RESPONSE FORMS 

In constructing an examination, the teacher will 
be confronted occasionally with content which seems to have but 
two possible responses. Included under the general heading of 
Alternate Response Forms are the true-false, the ves-no, and the 
multiple response. We will consider each briefly. 

1. Forms 

In the true-false, a statement is made and the student indi- 
cates whether it is true or false. Usually the provision for the an- 
swer is made on the test form itself, with the letters T F preceding 
the number of the question. The student encircles the T if he 
considers the statement to be entirely trm ; the F, if he considers 
the statement to be partially or wholly false. Sometimes a blank 
space is provided in front of the question, in which the student 
places the abbreviation “T” or “F.” This, of course, is not as satis- 
factory as the previously mentioned method, due to illegibility of 
handwriting. The abbreviation of -|- for true and — for false has 
been tried also, but has the same difficulty of not always being 
legible. If the examination is of such length that it extends over 
several pages-, it will be more economical to use answer forms. 
The one illustrated on page 46 can be adapted for either multiple 
choice or alternate response questions, and can be scored with a 
punched key. If the statement is considered true, the student places 
an X in the first row of brackets; if false, in the second row. In this 
way, the same answer sheet could be used for an examination in- 
cluding both multiple choice and alternate response questions. 
The directions may read, for example, that the first twenty-five 
questions are of the multiple choice type and questions twenty-six 
through seventy-five inclusive are of the true-false type. 

In the multiple response form of item, the statement is fol- 
lowed by three or more responses. The directions call for selecting 



1 88 Better Teaching Through Testing 

all of the answers which are true. There may be no correct an- 
swers; there may be any number of correct answers. An example 
of an item in which all of the answers are true follows: 

Softball. A runner on third base may score on 

(a) A wild pitch. 

(b) A passed ball. 

(c) A foul tip. 

(d) A throw back from catcher to pitcher. 

(e) A fly ball caught, providing that third 
base is held until the ball is caught. 

A better form of the stem is the direct question. In the 
question above the stem should read. On what play may a runner 
on third legally score? The question form is again illustrated be- 
low. In this case there are only three correct answers (a, d, e): 

Softball. Which of the following players may wear 
gloves but not mitts? 

(a) Pitcher. 

(b) Catcher. 

(c) First baseman. 

(d) Outfielder. 

(e) Shortstop. 

2. Uses 


The alternate response type of item can be used advan- 
tageously when you must cover large amounts of information and 
must economize on the pages of typing, or when little time is to be 
spent in constructing an examination and its use will be limited 
to motivation or instructional purposes. Another time when it 
can be used is when the information is merely factual and testing 
is for the ability to memorize, and there is no desire to test for the 
ability to make applications. Alternate response questions can be 
good or poor. 


3. Rules for Construction 

1. Make the statements or questions brief and direct. 

2. Avoid ambiguities. 

3. Avoid textbook wording. 

4. Have an approximately equal number of each alterna- 
tive, with no regular pattern to responses. 



Construction of Knowledge Examinations 


189 


4. Scoring 

The scoring recommended for true-false, yes-no, and plus- 
minus types is usually “rights minus wrongs.” This is more quickly 
computed by taking the total number of items minus twice the 
errors. If you have directed the students not to guess, then they 
should be penalized for errors. Thus, the scoring would be the 
total number of items minus twice the number of the errors, minus 
the number of omissions. In the multiple response type, an omis- 
sion is obviously an error and should be scored accordingly. 

RECALL 
1 . Forms 

Another grouping of forms is the Recall, in which a short 
response is expected associated with the question raised. One form 
uses a stimulus word or phrase, with a blank space provided for the 
student’s answer (a single word or phrase). 

Example: Measurement of width of hips: caliper. 

This type of form varies in difficulty from one extreme to Lhe other. 
If the student supplies an equally good word but not the one the 
examiner had in mind, it may be marked wrong, thus making the 
examination tend to be too difficult and more of a guessing contest 
than a knowledge test. To prevent this from occurring the test con- 
structor may be so definite in the part he supplies that the answer 
is obvious. 

The analogy form is illustrated below: 

I.Q.: Intelligence = : Motor Ability 

(Answer: M.Q.) 

It has a very limited use. 

The sentence completion, where the student is asked to sup- 
ply the missing word, is generally considered very poor. It tends 
to test for the trivial or to be a test of intelligence or vocabulary 
rather than a test of the content of the course. Its use should be 
avoided. 

A better type is the short essay or sentence, which calls for a 
single sentence reply or for listing. 



190 Better Teaching Through Testing 

Example, from a basketball test: 

What are the advantages of shifting zone defense 
over player-to-player defense? List four. 

1. 

3 . 

3 - 

4 - 

The difficulty here is in the scoring. The player may list four 
but they may not be the four which are most important. Or he may 
combine two under one number, leaving a space blank, or perhaps 
filling it incorrectly. 

2. Use 

It is a better form for instructional purposes than for testing 
purposes. Recall questions are generally considered to be more 
difficult than recognition questions. Their main use is in identifica- 
tion of terms, or for purely factual information. They could well 
be used for quizzes in anatomy or kinesiology, in asking for some 
such information as insertions and origins of muscles. 

3. Rules for Construction 

1 . Be sure that there is only one correct answer. 

2. Objectivity depends on brevity of the answer. 

3. If spaces are provided, they should be of uniform length, 
or they will serve as a clue. They should be long enough to take 
care of the longest reply. 

4. Space for answers should be provided in or near a margin, 
to facilitate scoring. 

4. Scoring 

They should be scored according to the number of correct 
responses. 

MATCHING 
1. Forms 

Still another type of question is the matching exercise. This 
is sometimes classified with multiple choice. There are two common 



Construction of Knowledge Examinations i gi 

forms: two columns of single words; or one column of words or 
names, with one of phrases or explanations. 

2. Uses 

Matching forms may well be used for the "who, when, and 
where” type of information. It is obviously weak in measuring 
interpretative abilities. 

3. Rules for Construction 

1. The second column should contain the responses and 
should always have more items than the first column, to prevent 
answering the difficult ones on the basis of elimination alone. The 
items in this column should be numbered. 

2. Blank spaces, for recording the number of the matching 
item should be placed in front of the items in the left hand column. 

3. Clues, such as grammatical form, proper names or capi- 
talization, should be avoided. 

4. The list should have homogeneous content. 

5. The directions should state whether items in column two 
may be used more than once. 

6. The instructions should be specific on the basis on which 
connections are to be made. 

7. The response column should be arranged in sequence, 
alphabetically or numerically. 

4. Scoring 

The score is the number of correct responses. If several 
choices are possible, correction for guessing must be made. (Total 
items minus twice the errors, minus one for each omission.) 

USE OF MISCELLANEOUS DEVICES 
Diagrams 

Diagrams should be used when questions involve spatial rela- 
tions, or wherever they can make the situation more clear. Some- 
times they actually save space. The examination should be arranged 
so that all the questions making use of a certain diagram are placed 
on the same page with it; or the diagrams may be placed on a 



192 Better Teaching Through Testing 

separate sheet of paper, each labeled with the question numbers. 

The use of diagrams in connection with questions involving the 
flight of an object is illustrated with questions from a golf examina- 
tion. 



1 . Of the flights shown in the diagram (Figure 38) which most 
closely approximates that of the midiron? 

(1) a. 

(*) b. 

(3) c. 

(4) d. 

(5) e. 

2. To make a stroke with a flight similar to a (Figure 38), 
what should you do? 

(1) Stand farther from the ball than for a stroke made 
with a wooden club. 

(2) Check the follow through in the direction you want 
the ball to go. 

(3) Take a stance with the ball nearer the rear foot than 
the front one. 

(4) Keep the face of the club closed. 

(5) Aim just above the center of the ball. 

Note: Answers: 1, 5; 2, 3. 

Sometimes one diagram can be used not only to save words but 
to save the time necessary for mental imagery. It also has the ad- 
vantage of providing the player with a more natural situation. 
An example of such a diagram is taken from a basketball test for 
girls: 



Construction of Knowledge Examinations 
North E’nd 


*93 



South l:T*d 

Figure 39. Basketball Diagram 

27. (See Figure 39.) G6 has the ball out-of-bounds at the 
end line. The ‘‘small letter” team is employing a shift- 
ing zone defense. If the ball is passed from G6 to G5 
to G4, what should fs do? 

(1) Move toward G5. 

(a) Remain where she is. 

(3) Move toward f4- 

(4) Move closer to center of division line. 




192 Better Teaching Through Testing 

separate sheet of paper, each labeled with the question numbers. 

The use of diagrams in connection with questions involving the 
flight of an object is illustrated with questions from a golf examina- 
tion. 



1 . Of the flights shown in the diagram (Figure 38) which most 
closely approximates that of the midiron? 

(0 a - 

<*) b. 

(3) c. 

(4) d. 

(5) e. 

2. To make a stroke with a flight similar to a (Figure 38), 
what should you do? 

(1) Stand farther from the ball than for a stroke made 
with a wooden club. 

(2) Check the follow through in the direction you want 
the ball to go. 

(3) Take a stance with the ball nearer the rear foot than 
the front one. 

(4) Keep the face of the club closed. 

(5) Aim just above the center of the ball. 

Note: Answers: 1, 5; 2, 3. 

Sometimes one diagram can be used not only to save words but 
to save the time necessary for mental imagery. It also has the ad- 
vantage of providing the player with a more natural situation. 
An example of such a diagram is taken from a basketball test for 
girls: 



Construction of Knowledge Examinations 
North E'nd 


*93 


6 3 


West & & East 

Slde ft u fc sye 

G4 


Gs 


Ge 

South Ertd 

Figure Basketball Diagram 

27. (See Figure 39.) G6 has the ball out-of-bounds at the 
end line. The “small letter” team is employing a shift- 
ing zone defense. If the ball is passed from G6 to G5 
to G4, what should f5 do? 

(1) Move toward G5. 

(2) Remain where she is. 

(3) Move toward f/j.. 

(4) Move closer to center of division line. 



ig4 Better Teaching Through Testing 

28. (See Figure 39.) If the ball is passed above the reach 
of gi to a forward at the spot marked X, what should 
g2 do? 

(1) Move toward the forward at spot X. 

(2) Remain where she is. 

(3) Move toward the northeast corner. 

Answers, depending somewhat on style of defense taught: 27, 3; 
28, 3. 

SUBSTITUTES FOR DIAGRAMS 

Sometimes a tabulation similar to those below can be 
incorporated in the responses themselves to save lengthy descrip- 
tions. 

Examples, from bowling: 

22. What is the score at the end of the third frame in the 
following game? 

Frame 1 st ball 2d ball 

1 6 4 

252 
3 8 1 

(,) 26. 

(*) 3 1 - 

( 3 ) 33 - 

(4) Correct answer not listed. 

(5) Incomplete. 

23. What is the score at the end of the third frame in this 
game? 

Frame 1 st ball 2d ball 

164 

2 1 9 

382 

(0 30 - 
( 2 ) Z'- 
is) 39 - 

(4) Correct answer not listed. 

(5) Incomplete. 



Constvuction of Knowledge Examinations igg 

24. What is the score at the end of the third frame in this 
game? 


Frame 1 st ball 2 d ball 

1 10 o 

8 7 3 

327 

(») 29 - 

(«) 38- 

(3) 4i* 


(4) Correct answer not listed. 

(5) Incomplete. 

25. At the end of the seventh frame the score was 100. 
What is the score at the end of the tenth frame? 


Frame 1 st ball 2d ball 

8 9 1 

9 i° o 

1080 

(1) 128. 

(2) 138- 

(3) 146 . 


(4) Correct answer not listed. 

(5) Incomplete. 

The correct answers: 22, 2; 23, 5; 24, 3; 25, 3. 

The same sort of thing can be done in rhythmical form and 
analysis tests. 

Example: 28. What is the time signature for the waltz? 

(*) 2/4- 

(2) 3/4- 

( 3 ) 6 / 8 . 

(4) 4/4- 

(5) 5/4- . . 

29. In which rhythmical pattern is the time synco- 
pated? 

« J ; / 

( 2 ) in m 

(3) 1 j : 

(4) ; / /. t 

Correct answers are: 28, 2: 29, 3- 



196 Better Teaching Through Testing 

CHECK LIST FOR EVALUATING ITEMS 

After the items have been prepared, the use of the 
following check list # will prove helpful in evaluating them. 

(1) Exactly what is this item intended to measure? 

(2) Is the intended purpose of the item acceptable? Is it im- 
portant that the item be included; does it test for something sig- 
nificant? 

(3) Is there any ambiguity in the item? Will the student recog- 
nize the purpose of the item? Can it be made more clear? Are there 
any qualifying phrases that might start the student to thinking 
along an irrelevant line? 

(4) Does the item contain any unintentional clues to the correct 
response? 

(5) Will authorities agree on the correct response? Are the re- 
sponses which are intended to be wrong really less acceptable than 
the correct one? 

(6) In multiple choice items are any of the wrong responses 
likely to appear more plausible than the correct answers to the best 
of the students to be tested? Is the item too difficult for the best 
students in the group? 

(7) Is the item phrased as economically as possible? Is it straight 
forward, direct? 

(8) Is the form of the item as well adapted as any to its intended 
purpose? Would a diagram help? 

(9) Would the rote learner have any undue advantage in re- 
sponding to the item? Has “textbook” language been avoided? 

Another method of evaluating questions before trying them on 
a group is to have some other person go over them. Re-reading the 
questions yourself after the lapse of a few days is also a good pro- 
cedure. 

For help in the technical problems involved in setting up 
the examination and administering it, see Chapter 3, pages 44 
to 47. 


• Check list adapted from class notes: Improvement of the Written Examination, 
taught by Dr. E. F. Lindquist, University of Iowa. 



Construction of Knowledge Examinations ig<7 

As a guide to proper administration, use this check list: 

(10) Is the provision for the student’s response as economical of 
his time as possible? 

(1 1) Are the directions to the student as simple and understand- 
able as they can be? 

(12) Does the provision for the student’s response provide for 
convenient, accurate, and economical scoring? 

(13) Can the typographical arrangement of the items be im- 
proved? 

(14) Is the spread of estimated difficulty of items adapted to the 
spread of ability in the group to be tested? (See Chapter 9, p. 230.) 

(15) Are the time limits adequate? 

(16) Are the questions placed in the test in an order progressing 
from easy to difficult, as estimated? Will the slow student be pre- 
vented from spending an undue amount of time on items that are 
too difficult for him? 

If a test is to be a valid test, that is, if it measures what it purports 
to measure, the following criteria should be considered: 

(17) Are any important objectives or outromes of instruction 
seriously neglected in the test as a whole? 

(18) Is the emphasis on functional value and not on content 
objectives? 

(19) Is there any undue testing of isolated detail or unimportant 
items of information, such as terminology or definitions, for their 
own sake? 

(20) Are test situations suggestive of the life situations in which 
the student may, make actual use of what he has learned? 

(21) Would this test be less satisfactory if used as an “open book” 
test? Are there a sufficient number of questions which require 
drawing of inferences and making of applications? 

Obviously, these are rigid criteria and it is doubtful if any test 
ever meets all of them; but they are goals toward which the con- 
scientious may strive. Some may have to be sacrificed somewhat for 
the sake of others, for example, number ten. The answer sheet re- 
quires more of the student’s time; it has been estimated that the 
students can answer about ten percent fewer items when using an 
answer sheet than when writing directly on the test forms. This 
percentage is reduced when students become accustomed to using 



goo Better Teaching Through Testing 

art used; otherwise scoring is too difficult. Arrange the questions 
in order of estimated difficulty, ranking from easy to hard. The 
copy given the typist should be readable and in as nearly as possible 
the exact form that you wish it. Be sure to: 

1. Provide a space for the student’s name and other essential 
data, either on answer sheets, or on test forms if answer sheets are 
not to be used. 

g. Indicate amount of space to be left for diagrams. 

3. Ask that the question be arranged so that the diagrams can 
appear on the same page with the questions that refer to them. 

4. Ask that no questions be split; all of the items should appear 
on the same page with the stem. 

5. If abbreviations or symbols are to be used ask that they be 
repeated at the top of each page. 

6. Ask to proofread the stencils before the examination is re- 
produced. 

ADMINISTERING THE EXAMINATION 

Suggestions for administering the examination and 
scoring the papers are included in Chapter 3. Conversion of raw 
scores into letter grades is covered in Chapter 9. 

HOW TO CHECK ON THE EFFECTIVENESS 
OF EXAMINATIONS 

Perhaps a few statements should be made that will 
aid teachers who wish to revise and improve their tests. It is as- 
sumed that the teacher who takes considerable time and care in pre- 
paring an examination will want to use it again and will, therefore, 
not permit the students to retain their examination papers. We 
will not deal with the evaluation of questions by statistical methods 
here (see Chapter 9) but will list a few criteria for subjectively 
determining a general estimate of the worth of the examination 
in its entirety: 

1. Did it provide a wide range of scores, with no undue massing 
of scores at any one point along the scale? 



Construction of Knowledge Examinations goi 

2. Does the order or rank of scores coincide roughly with your 
previous estimate of the abilities of individuals within the group? 

3. Was the examination sufficiently comprehensive? Did it cover 
all the important phases of instruction? 

4. Was it the right length? Would the students have had time 
to answer more questions? 

5. Were the questions clear? Did the discussion of papers reveal 
any ambiguities? 

6. Did the examination indicate areas of content which need no 
further instruction? Did it reveal inadequate understanding of cer- 
tain phases of subject matter? 

REVISING EXAMINATIONS 

In revising examinations, the results previously ob- 
tained need to be carefully studied. Sometimes the difficulty of 
questions needs to be adjusted to the anticipated ability of the 
group. If you wish the total scores to differentiate between mem- 
bers of a certain section of the class, then q.iite a few questions 
with difficulty corresponding to their ability should be included. 
The following question, from a tennis examination administered 
to a group of eighty-four physical education major students, proved 
easy: 

What strokes, in addition to the serve, forehand drive, 
and backhand drive, are most important for the be- 
ginner to master? 

(1) Volley, chop. 

(2) Lob, volley. 

(3) Half volley, chop. 

(4) Half volley, volley. 

(5) Lob, chop. 

The answer, 2, was selected by seventy-five of the eighty-four, 
or eighty-nine percent. This gives the question a difficulty rating 
of only eleven. Of the nine who failed to select the right answer, 
three selected response 1 , two selected response 4, three selected re- 
sponse 5, and one omitted the question. The question was retained 
for further use because it discriminated well between the good and 
poor students, but response 3, which was not selected by anyone 



so* Better Teaching Through Testing 


Court Areas 


Bade Boundary line 





Singles 





Doubles 
Service Court 

Service 

Court 







Short service line 

Net 













Side Boundary line for singles 
Side Boundary line for doubles 


Scoring: The ladies' singles game consists 

of points, ladies’ doubles .... points, 

and mixed doubles .... points. In doubles, 
when the score is “13 all," the side which 
first reached 13 has the option of “setting" 

the game to and when the score is "14 

all,” the game may be set to In the 

singles game, when the score is "9 all,” the 

game may be set to and when the score 

is "10 all,” the game may be set to A 

match consists of games. 

Differences Between Doubles and Singles: 

The side alleys are used only in the 

game. In the .... game, the long service line 
is the same as the back boundary line. In the 
.... game, the service area is wide and short, 
compared to the .... game, where the serv- 
ice area is narrow and long. 

Take a pencil and shade the area in the 
diagram that represents the doubles service 
court. Then do likewise for the singles 
service court. 


Strokes (Diagrams indicate the line of flight of the bird) 



Name: 

Uses: 

How to return it: 







j 



Name: 

Uses: 


How to return it: 

Name: 

Uses: 

How to return it: 

Name: 

Uses: 

How to return it: 

Name: 

Uses: 

How to return it: 




80 S 


Construction of Knowledge Examinations 

Equipment— Ham should racquets and shuttlecocks be cared for? 

Terminology — Define the following terns: 

Fault: 

Home plate: 

Side out: 

Slung: 

Hand out: 

Inning: 

Systems of Court Coverage in Doubles: 


Name Advantages . . 

Disadvantages 

Name Advantages . . 

Disadvantages 

Name Advantages . . 


Disadvantages 


Self-Inventory: 

1. How many of the following strokes do I actually use in a game? 

a. short serve 

b. long serve 

c. clear 

d. hairpin drop shot 

e. cross court drop shot 

f. long drop shot 

g. smash 

h. drive 

a. What strategy do I employ? 

a. Placement of serve to opponent’s back- 
hand. 

b. Placement of all strokes to open areas, 
thus forcing opponent to move. 

c. Use of drop shot to lead opponent into 
setting bird up for a smash. 

d. Placing strokes to the vulnerable back- 
hand rear comer. 

e. Feint. 

£• Change of pace. 



204 Better Teaching Through Testing 

Part I. Rules Chart. Fill in the blank spaces in the chart below and on another piece of 
paper, make a similar form, extending the chart. Use this chart in reviewing for 
examination. The first one has been filled in as an example. 


Description of what happened 

Foul, viola- 
tion or legal 
play 

If foul, 
what type 

Name 

Penalty 

0. Forward in act of shooting pushes 
guard away from her by placing 
the ball against guard’s chest and 
pushing with it. 

1 Foul 

Personal 

Charg- 

ing 

1 free 
throw 

1. One player in possession of the 
ball is guarded between two play- 
ers and is unable to pass the ball 
before the three second time limit 
has been reached. 





*. Player, by use of personal con- 
tact, impedes the progress of an 
opponent who has started to ad- 
vance the ball by means of a 
bounce or a juggle. 





3. Player restricts the fredeom of 
movement of an opponent with- 
out the ball by disregarding the 
ball and shifting her position as 
the opponent moves. 





4. Player in act of shooting is 
bumped front the rear by an 
opponent. Basket is missed. 





5. As above (4) except basket is 
made. 





6. Players are jumping for rebound. 
Red player gels both hands on 
the ball; Blue player gets one 
hand on the ball slightly later, 
and thinking she has legally tied 
the hall keeps hand on it. 





7. Player snatches the ball out of 
the hands of an opponent who 
has it legally in her possession. 





8. Player fumbles the ball in catch- 
ing it, recovers it from the floor, 
and then bounces it. 





9. On toss up, one of the players 
taps the ball twice in succession. 





10. Team has had three time outs. 
Captain requests a fourth time 
out. 






Figure 41. Sample Work Sheet for Basketball 



















Construction of Knowledge Examinations 305 

was dropped. This particular question could probably be made 
more difficult by removing the clause in the stem, and adding those 
three strokes at various places in the responses. 

A question which proved difficult for the same group (difficulty 
rating was 93) was as follows: 

Girls’ volley ball. The ball is served by team A to the 
center back player on team B. What should the center 
back do? 

(1) Set the ball up to herself and spike it. 

(2) Set the ball up to a center row player. 

(3) Set the ball up to herself and pass to a spiker in 
front row. 

(4) Set the ball up to herself and pass to a short player 
in the front row. 

(5) Return the ball to the rear of team A’s court. 

The greatest difficulty encountered was in selecting between the 

second and fourth items, although the errors in selection were dis- 
tributed well enough to cause all parts to function. There are 
several ways in which the question could probably be made easier: 
eliminate item 2, or indicate the need for taking the spin off the 
ball by setting it up to self (do this in the stem), or add the word 
"immediately” or "directly" to item 5. 

Care must be taken in attempting to revise a question to keep 
the intended purpose of the question in mind and not to make it 
easier by the simple device of throwing in responses that no one 
will select. A question with a long and involved stem can be clari- 
fied sometimes by breaking it into two sentences. 

TESTS AS TEACHING DEVICES 

Sometimes teachers will want to use tests for instruc- 
tional purposes, with no desire to use them as a partial basis for 
grading. One such device is the badminton worksheet presented 
here. This and s imilar worksheets can be used while students are 
resting between periods of play, in connection with the showing of 
movies, or can be filled in outside of class time. A master sheet 
with correct answers can be posted, if the teacher does not have 
the class time to go over the answers with the students. The work- 



206 Better Teaching Through Testing 

sheets may be retained by the students, and they may make addi- 
tions to them, to use in reviewing for the course examination. The 
experienced teacher will have no difficulty in preparing similar 
worksheets for his classes, selecting those things which he considers 
most important to stress. 

The badminton worksheet illustrates a justifiable use of the com- 
pletion type of exercise. The purpose here is to provide the student 
with the correct information, not to test him. An effort should be 
made to encourage the student to make comparisons and to think 
for himself, rather than merely transferring answers directly from 
a text or lecture. If the worksheet is to be corrected by the teacher 
the form should be changed with blank spaces placed in the margin. 

Portions of worksheets * in other activities follow, illustrating 
one method of handling the content. 

FOLK (OR COUNTRY) DANCE 

Characteristics of specific dances: 

Name of dance No. in dance Origin Steps Floor pattern 

1 

2 

3 


to 

TENNIS 
A. Singles 

1. Draw a diagram of the court and indicate the “vital area” 
and “no man’s land.” 

2. Explain the reason for returning to a position behind the 
center of the baseline after each stroke. 

• These illustrations aTe adapted from worksheets prepared for use in college 
classes, University of Minnesota, by staff members: Folk or Country Dance, Mary 
Virginia Gardner and Elizabeth Kratz; Tennis, Virginia Pettigrew; Archery, Eloise 
Jaeger and Catherine Snell; Badminton and Basketball, Esther French; Posture 
and Conditioning, Ellen Kelly. 



Construction of Knowledge Examinations go* 
B. Doubles 

1 . Explain the parallel system of court coverage 

2. List the disadvantages of the “up and back” system of 
court coverage. 

ARCHERY 

A. List the common errors made in shooting, the result, and the 
correction. 

Error Result Correction 

1. 

2 . 

3 - 

4 - 

5 - 

6 . 

B. Describe the following types of competition (on a separate sheet 
of paper): 

1. Columbia 4. York 7. Roving 

2. American 5. Clout shoot 8. Flight shoot 

3. National 6. Archery golf 

C. Supply answers on a separate sheet: 

1. What are the causes of the arrow falling off the hand? 

2. How should arrows be removed from the target? 

3. How should the bow be bent? 

4. How should the arrow be nocked? 

5. What are some of the causes of injury? 

6. Draw a diagram of an arrow, naming the parts. 

POSTURE AND CONDITIONING 

A. Describe or name exercises for various parts of the body and 
various purposes. 

1. Improve circulation (warm-up and flexibility). 

2 . Abdominal strength. 

3. Arm and shoulder strength. 

4. Upper back strength. 

5. Foot strength and correct use. 



ao8 Better Teaching Through Testing 

6. Leg strength. 

7. Relaxation. 

8. Body alignment. 

(This same type of knowledge can be obtained by list- 
ing the exercises vertically and the purposes horizon- 
tally, thus providing a checkerboard diagram for check 

marks.) 

B. Supply answers on a separate sheet. 

1 . Standing 

a. How should one’s body weight be distributed over 
the feet? 

b. When viewed from the side, which landmarks 
should be in a vertical, straight line over the middle 
of the foot? 

c. When viewed from the back, which landmarks 
should fall in a straight, vertical line centered be- 
tween the feet? 

d. What should you do in testing your posture against 
a wall? 

2. Sitting 

a. List the characteristics of a properly fitting chair 
and a desirable posture for active work arid study. 

b. How can fatigue and backstrain be minimized in 
sedentary jobs? 

c. Describe an attractive and efficient form in getting 
into and out of a chair. 

d. What is a good sitting position for recreational 
reading? 

3. Walking and running 

a. How can the arches be protected from undue strain 
in walking and running? 

b. What are the factors which add or detract from the 
appearance in walking? 

4. Lifting 

a. What form should be used in lifting heavy loads 
with a minimum of fatigue and without danger of 
back injury? 

b. How can greatest force be applied in lifting? 



Construction of Knowledge Examinations 209 

When tests are to be used for teaching purposes only, it is pos- 
sible to make use of many of the forms that can not be used in 
objective tests because of the difficulties involved in scoring. Some 
of the forms that have proven valuable are definitions, short essays, 
and “listings.” Be sure to keep the purpose in mind and avoid the 
use of teaching types of questions except in those examinations 
which are aimed primarily as teaching devices. 


BIBLIOGRAPHY 


1. Bird, Charles, and Andrew, Dorothy M.: "Concerning the Length of the New 
Type Examination,’’ Journal of Educational Psychology, *7, 1936, p. 641 

s. : "Comparative Validity of New Type Questions,” Journal of Educational 

Psychology, s8, 1937, p. *41 

3. French, Esther: "The Construction of Knowledge Tests in Selected Profes- 
sional Courses in Physical Education,” Research Quarterly, 14, Dec. 1943, p. 406 

4. Hawkes, Herbert E., Lindquist, E. F., and Mann, C. R.: Construction and Use 
of Achievement Examinations. Houghton Mifflin Company, Boston, 1936 

5. Hewitt, Jack E.: "Comprehensive Tennis Knowledge Test,” Research Quar- 
terly, 8, October 1937, p. 74 

6. Lindquist, E. F., and Cook, Walter W.: "Experimental Procedures in Test 
Evaluation," Journal of Experimental Education, 1 1933, p. 163 

7. Pullias, E. V.: "Problems in Evaluation of Responses,” Education Methods, 
18, 1938, p. 75 

8. Richardson, M. W., Russell, J. T., Stalnaker, J. M., and Thurstone, L. L.: 
Manual of Examination Methods. University of Chicago Press, 1933 

9 Ruch, G. M.: Improvement of the Written Examination. Scott Foresman and 
Company, Chicago, 19x4 

10. Rugen, Mabel E., and Saurborn, Jeanette: Physical Education Teaching Man- 
ual. Edwards Bros. Inc., Ann Arbor, 1936, Chapters 10 and 11 

11. Scott, M. Gladys: "Achievement Examinations for Elementary and Intermediate 
Swimming Classes," Research Quarterly, 11, May 1940, p. 100 

12. : "Achievement Examinations for Elementary and Intermediate Tennis 

Classes,” Research Quarterly, 12, March 1941, p. 40 

13. ; “Achievement Examinations in Badminton," Research Quarterly, 12, 

May 1941, p. 242 

14. Snell, Catherine: "Physical Education Knowledge Tests,” Research Quarterly, 
6, October 1935, Fundamentals, p. 79; Hockey, p. 86. 

15. ; “Physical Education Knowledge Tests,” Research Quarterly, 7, March 

>936, Basketball, p. 79; Soccer, p. 76; Volley Ball, p. 73 

16. : “Physical Education Knowledge Tests,” Research Quarterly, 7, May 

1936, Baseball, p. 87: Golf, p. 77: Tennis, p. 84 

17- Swineford, Frances: "Validity of Test Items,” Journal of Education Psychology, 


27, 1936, p. 68 

»8. Trumbull, Katharine S.: "Planning for Results,” Journal of Health and Physi- 


cal Education, 15, January 1944, p. 19 



9 - 


Simple Statistical Proceedings 


In the belief that it is important for all teachers of 
physical education to have at least a speaking acquaintance with 
simple statistical procedures, this chapter is included. Many excel- 
lent textbooks are available for teaching methods of statistical com- 
putation. 2 ' *■ B Texts which point out the limitations of the various 
devices are recommended, because it seems to the authors that there 
is danger in the misuse of techniques by persons with only a limited 
understanding of the underlying assumptions that are made in the 
development of each technique. The importance of applying logic 
to research cannot be over emphasized. 

THE FREQUENCY DISTRIBUTION 

Let us suppose that you have given a test to a group 
of individuals and have one score for each individual. To get any 
generalized concept of the performance of the group and of the 
meaning of each score, it is necessary to arrange them in some 
manner. To apply the example to physical education, take the 
scores obtained from forty-three college girls on a badminton clear 
test (see p. 51 for the test description). The test was given at the 
end of seven periods of instruction. A perfect score on the test is 
100. 


TABLE XV 

Scores of Forty-three Students on the Badminton Clear Test 


45 

6 

75 

60 

78 

47 

65 

65 

63 

3 * 

as 

35 

56 

74 

54 

72 

64 

72 

28 

39 

47 

54 

73 

59 

78 

64 

63 

69 

86 

49 

85 

69 

64 

79 

60 


60 

80 

5 * 

80 

84 

61 

66 

62 



' 4 » 8 See references in bibliography at end of this chapter. 

aio 



in 


Simple Statistical Procedures 

The raw scores, as presented in Table XV, are not very mean- 
ingful or helpful in evaluating the relative performance of any 
individual within the group. It is apparent that no student made 
a perfect score, and also that the highest score made was 86 and 
the lowest score 6; this arrangement does make it possible to ob- 
tain the range. (The range is the difference between the highest 
and the lowest score in the series.) 

The scores can be interpreted better if all the possible scores 
within the range are listed in order of size, and then the number of 
times each score was made is recorded, as has been done in Table 
XVI. 

TABLE XVI 

Simple Frequency Distribution of Scores on Badminton Clear Test 


( Intervals of one unit) 


s 

F 

S 

F 

s 

F 

S 

F 

S 

86 

1 

69 

2 

58 

1 

35 

1 

18 

85 

1 

68 


5i 


34 


*7 

84 

1 

67 


50 


33 


16 

83 


66 

1 

49 

1 

38 

i 

15 

8s 


65 

2 

48 


3i 


H 

8i 


64 

3 

47 

2 

30 


13 

80 

2 

63 

2 

46 


29 


12 

79 

1 

62 

1 

45 

1 

28 

1 

11 

78 

2 

61 

1 

44 


27 


10 

77 


60 

3 

43 


26 


9 

76 


59 

1 

42 


25 


8 

75 

1 

58 


4 1 


*4 


7 

74 

1 

57 


40 


23 


6 

73 

1 

56 

1 

39 

1 

22 

1 


7« 

2 

55 


38 


21 



7i 


54 

2 

37 


20 



70 


53 


36 


19 




The more frequently occurring scores now stand out, the points 
of concentration of scores are apparent, and the number of scores 
between any two given points can be quickly secured by simple 
addition. (The table is usually arranged in a linear form, instead 
of in the separate five columns shown here, an arrangement to con- 
serve space. When in linear form the graphic distribution of scores 
is more apparent.) But Table XVI needs to be condensed into 
“classes” of scores, for convenience in handling. Each class or 



818 Better Teaching Through Testing 

“step-interval” will include all the records of scores within the 
limits of that interval. The size of the step interval will depend, 
somewhat, upon the use to be made of the data. There is seldom any 
need for fewer than twelve intervals or more than twenty. Dividing 
the range by fifteen will give a quick approximation of the size 
of the interval. Example, range: 86 (highest score) minus 6 (lowest 
score) or 8o; 8o divided by 15 is 5.3, or in round numbers, 5. For 
ease and accuracy in tabulation, an interval of 1, 8, 3, 5, 7, 10, 15, 
or any higher multiple of 5 is preferred. The increase or decrease in 
the number of classes by one or two is not important. Table XVII 
shows the grouped frequency distribution of the same data as used 
in Tables XV and XVI, using intervals of five units. 

TABLE XVII 

Grouped Frequency Distribution of Scores on Badminton Clear Test 


(. Intervals of Five Units) 


s 

Tab 

F 

83-87 

III 

3 

78-82 

H+t 

5 

73-77 

III 

3 

68-72 

llll 

4 

63-67 

mm 

8 

58-62 

mi 

6 

53-57 

III 

3 

48-52 

// 

2 

43-47 

III 

3 

38-42 

/ 

1 

33-37 

1 

1 

28-32 

1/ 

2 

23-27 


0 

18-22 

I 

1 

1 3* 1 7 


0 

8-12 


0 

3 - 7 

1 

1 


N -43 

This table could be made more compact by increasing the size 
of the step-interval. By using an interval of 25 units, for example, 
only four classes would be needed. But such an increase in the size 
of the unit would mean a larger loss in the identity of the original 
scores. If an interval of 25 were used, over half of the scores would 



Simple Statistical Procedures gig 

fall in one interval (63-87), &us hiding most of the characteristics 
of the original distribution. If only a very rough picture of the 
distribution of scores is needed, then a very broad interval may 
prove satisfactory. If any detailed study is to be made, or if high 
precision in description is desirable, then die interval used should 
be small. 


The steps in constructing a grouped frequency distribution, 
then, are as follows: 

1. Prepare a data sheet with the three headings, as shown in 
Table XVII. (The abbreviation S is for scores; Tab for tabulation 
of tally marks; and F for frequencies, or the number of times that 
a score occurs within that interval.) 

2. Determine the range and divide by 15. (Carry the results to 
only one decimal place.) 

3. Select from the following preferred list the number nearest 
the quotient secured in Step 2: 1, 2, 3, 5, 7, 10, 15, or any higher 
multiple of 5. 

4. Write the limits of each interval, in descending order, in the 
first (S) column of the table, beginning at the op with the interval 
containing the highest score. Determine these limits as follows: 

(a) When the number of units in the interval is an even 
number, the lower limit of each interval should be a 
multiple of this number. 

(b) When the number of units in the interval is an odd 
number, find the multiple of this number that is near- 
est to the highest score in the series. (In the example, 
using an odd number for the interval (5), with the 
highest score being 86, the nearest multiple is 85.) 
Select the limits of the interval so that this multiple is 
the middle score in the interval. Thus the midpoints 


of all the intervals will be a multiple of the interval 
unit size. (See example, Table XVII.) The reason is for 
convenience only. Because of the loss of identity of the 
original scores in any grouped frequency distribution, 
it is necessary in later computations to use the mid- 
point to represent the value of the scores in the in- 
terval. It is convenient, then, to have the midpoint be 
a whole number, rather than a decimal value, and also 



214 Better Teaching Through Testing 

to have it be a multiple of the step-interval. Note that 
the use of an interval containing an odd number of 
units results in a more convenient midpoint, as the 
midpoint of an even interval will be a decimal value. 

5. Tabulate the scores, by placing a tally mark opposite the ap- 
propriate interval. 

6. Record the number of tally marks opposite each interval 
in the frequency column and add, as an immediate check on the 
accuracy of tabulation. (See N, the symbol for the total number of 
cases, in Table XVII.) 

The frequency distribution has been described in detail because 
it is the basic step for practically any statistical procedure. 

CONVERTING RAW SCORES INTO LETTER GRADES 

Raw scores can be quickly transferred into letter 
grades, when placed in a frequency distribution. Let us suppose 
that you have decided to base the grades in a particular course as 
follows: one-third to be determined by skill test scores, one-third 
by knowledge test scores, and one-third by subjective estimate. If 
your school uses letter grades, a percentage system such as 7% A, 
24% B, 38% C, 24% D, and 7% E (or failure) has probably been 
established. The limits of each letter grade for the various scores 
in Table XVI can be determined by multiplying N (43) by each 
of the respective percentages, rounding the figures. Seven times 43 
is 3.01 or 3, so the value of A is assigned to the top three scores. 
If a plus and minus system is used, the score 86 would have the 
value of A -f-, and the score 84 the value of A — . Twenty-four times 
43 is 10.32 or 10. The scores falling between 70 and 80 are given 
B grades. Thirty-eight times 43 is 16.34 or 16. Counting sixteen 
down, the C’s will include scores from 69 to 57, inclusive. The num- 
ber of D’s will be the same as the number of B’s, since the per- 
centage is the same, so the next ten scores are assigned the value 
of D. This includes the scores through 33, and four scores remain. 
This is not an error but is the result of the rounding of numbers. 
The question now is shall we assign a value of D to the score 38 
or the value of E. Since 32 is closer to 35 than to 28, it seems fair 



Simple Statistical Ptocedutcs 

to place it in the D group, and extend the interval 
as shown in the table below. 


*15 

to include 3a, 


F 


81 and up 

A 

3 

70-80 

B 

10 

57-69 

C 

16 

32-56 

D 

11 

31 and below 

£ 

3 


N 

— 43 


When this is done from a grouped frequency distribution, a few 
more such complications may occur. For example, try using Table 
XVII, which has the same data, but grouped in intervals of five 
units. The top (A) interval now becomes 83 and up. The B’s will 
include the five recorded in the 78-82 interval, the three in the 73- 
77 interval, and it will be necessary to include the four in the 68-72 
interval. This gives you 12 B’s. It is obvious that if you start the 
C’s by counting sixteen cases, you will again find yourself giving 
too many and will end without any failures. To prevent this, sub- 
tract the extra number of B’s (2) from the total number of C’s (16) 
and give just 14 C’s. Fortunately, this break comes at the end of 
an interval, so scores 58-67 inclusive are assigned C’s. You now start 
with a clean slate, to assign 24% of the cases, or jo, to the D cate- 
gory. The D’s then include scores of 33-57. This time you have just 
three remaining scores, for the failures. Your scoring table now 
is as follows: 

F 


83 and up 

A 

3 

68-8s 

B 

18 

58-67 

C 

14 

33-57 

D 

10 

32 and below 

£ 

3 


Sometimes it is simpler and gives a more equitable distribution 
of grades if the procedure is changed. After computing the correct 
number of cases for 7% and counting them off at the top, take the 
other end of the distribution and count up in reverse fashion to 
determine the limits of E. Likewise, when the limits for the 24% 
B’s have been set, count up from the E’s to determine the limits 
for the corresponding set of D’s. This may leave slightly more or 



ai6 Better Teaching Through Testing 

less than 38% in the center for C’s, but this represents the average 
group and it is probably better to have it overloaded than to have 
one of the other categories altered. 

It is also necessary to make intelligent modifications in the appli- 
cation of the percentage system. If the upper scores fall far short 
of expectations or fail to measure up to known standards for such 
a group, the teacher may reduce the number of A’s or eliminate 
them entirely. In this instance the extra number of cases will be 
added on to the B or C category, depending upon where there is 
the greatest concentration of scores. Also, on the low end of the 
distribution the poorest scores may not be really poor in terms of 
the teacher’s expectations for the class, in terms of known standards 
of performance, or in terms of actual difference from average or 
best scores for that particular group. Under these circumstances, 
the teacher would probably not give any E’s, but increase the num- 
ber of D’s or C’s. 

Also, if there are long gaps in the distribution they may form 
more natural and just divisions between categories than those de- 
fined by the percentage system. For example, by comparing Tables 
XVI and XVII, the latter shows a little more clearly that a 8 is 
closer to 32 than to 22. Working from Table XVII the teacher 
would be much more apt to assign only 2 E’s, to scores 22 and 6. 
This apparently large gap between 28 to 22 is partly the result of 
the limits defined for the step-interval; but the actual assignment of 
28 to a D or an E must finally be determined by a subjective esti- 
mate of how poor 28 really is. By looking at the total distribution 
and the difference between it and the maximum score it would 
appear really poor, but so would scores 32, 35 and 39. It is possible 
that in some cases the teacher might assign E’s to 39 and all scores 
below it, though this would probably be too drastic if it is a final 
term grade. 

The larger the number of cases in the distribution the fewer 
and shorter will be the gaps between scores. If the frequency dis- 
tribution shown in Table XVII is kept and scores added to it for 
a number of classes or seasons, and the limits are adjusted each 
time, more stable divisions will gradually evolve. 



*i7 


Simple Statistical Procedures 
AVERAGING LETTER GRADES 

It is often necessary to average several letter grades 
to obtain one grade for each student. If you are basing one-third 
of the grade on shill test scores, one-third on knowledge test scores, 
and the other one-third on a rating of the student in the activity 
itself, you may have as many as three or more grades to average, 
depending on the amount of testing that you have done. These 
may be further complicated by the plus and minus system. Points 
can be assigned to each grade, totalled, and then divided by the 
number of grades, in a system somewhat similar to the honor point 
system. Suppose Jane Doe has the following record: A — , C 
D, B — , E, C, B -f-, C — . By use of a point value table, these scores 
can be converted into a total of 43 points. When divided by 8 this 
gives a point score of 5. This is equivalent to a grade of C. The 
point value table appears below. 

TABLE XVIII 

Point Value Table for Averaging Letter Grades 

A -f- : 1* B -f : 9 C-f-:6 D -f : 3 E :o 

A: 11 B : 8 C:5 D : * 

A — : 10 B — : 7 C — 14 D — : 1 

This table may also be used on the knowledge and rating phases 
of her final grade. Let us suppose that the eight scores above were 
skill test scores, so Jane now has a single score of C for skill, as 
measured by the' eight tests. If she made a D on the final knowledge 
test, the only one given, and was rated a B player, her score can be 
determined without use of the above table. But if her knowledge 
score was A -f- and her rating D, with skill score C, the table may 
be needed to obtain a total of 19 points, divided by 3, with the 
result of 6.3, or an average grade of C -f-. 


MEASURES OF CENTRAL TENDENCY 

Before proceeding with the discussion of methods of 
converting raw scores into comparable scores (scores that can be 
averaged to obtain a single grade) measures of central tendency 



8x8 Better Teaching Through Testing 

need to be considered. This discussion will be limited to the two 
most commonly used in handling physical education test data, 
namely, the mean and the median. 

The mean (M) does not involve a new concept; it is simply the 
arithmetical average. It is obtained by adding all of the scores and 
dividing by the number of individuals. It is based, then, on every 
score in the distribution. The mean being a mathematically de- 
rived value, may be used in further computations. 

The median (Mdn) is the middle measure in a series in which 
all of the measures have been arranged in the order of their size. It 
may be quickly computed from a frequency distribution by divid- 
ing the N by two and then counting up that number of frequencies. 
When the number of cases is small and one or more cases deviate 
markedly, the median will give a better representation of the typi- 
cal than will the mean, which is affected by all the scores. 

The mean and the median will be computed for the data pre- 
sented in Table XVII, Finding the Arithmetic Mean of a Grouped 
Frequency Distribution. 

The mean can be found by the laborious method of summing 
all the original scores, as presented in Table XV, and then dividing 
by N. When the number of cases is small, as in this example, that 
can be done without great effort. But frequently it must be com- 
puted from a grouped frequency distribution containing many 
cases. The steps involved are as follows: 

1 . Select as an arbitrary reference point the interval which you 
think is most likely to contain the actual mean. 

8. Express each interval above and below the interval contain- 
ing the arbitrary reference point (A.R.) as a deviation. Record in 
the d column, as in the illustration. All the deviations below the 
A.R. must be preceded by a negative sign. 

3. Multiply the frequency in each interval by the corresponding 
d value, and record the products in the column headed fd. (Note 
that all those below the A.R. will have a negative sign.) 

4. Add the positive products, then the negative products, and 
add these sums algebraically. 

5. Divide this result obtained in Step 4 by N. This represents 
the correction to the A.R., expressed in interval units. To make 



Simple Statistical Procedures 8ig 

the correction complete in score units, this quotient must be mul- 
tiplied by the size of the interval, (5, in the illustration). 

6. Add this product algebraically to the A.R., which for this 
purpose is the midpoint of that interval. (Add if positive, subtract 
if negative.) 

When studying Table XIX for the computation of the mean, 
disregard the column headed cf. 


TABLE XIX 


Computation of Mean and Median for Frequency Distribution 
on Badminton Clear Test 



f 

d 

fd 

cf 

83-87 

3 

5 

15 

43 

78-82 

5 

4 

20 

40 

73-77 

3 

3 

9 

35 

68-72 

4 

2 

8 

3 * 

63-67 

8 

1 

8 

28 




-f- 60 


A.R. 58-62 

6 

0 


20 

5357 

3 

— 1 

- 3 

14 

48-52 

2 

— 2 

- 4 

11 

4347 

3 

- 3 

- 9 

9 

3842 

1 

- 4 

- 4 

6 

3337 

1 

- 5 

- 5 

5 

28-32 

2 

_ 6 

— 12 

4 

23-27 

O 

- 7 


2 

18-22 

1 

- 8 

- 8 

2 

> 3-17 

O 

- 9 


1 

8-12 

O 

— 10 


1 

3 - 7 

1 

— 11 

— u 

1 


- 5 6 


.Hd = 60 — 56 = 4 

2'fd == 4 = .093 or .09 (rounded) 

IT m 

.09 is the correction in interval 
units 


Correction in score units 
5 x 09 — .15 

Mean = A.R. + correction in 
score units = 6c + .45 «=* 60.45 


FINDING THE MEDIAN OF A GROUPED 
FREQUENCY DISTRIBUTION 


The median in a simple frequency distribution, as 
shown in Table XVI, can be quickly ascertained by dividing 1 y 
two and finding that point on the scale above or be ow w ic a 
of the frequencies lie. In the example in Table XVI the medtan 
would fall at 6 g. To compute it from a grouped frequency « 
tion, the fact that we have lost the identity of the origina 



280 Better Teaching Through Testing 

necessitates a somewhat more complicated procedure. The first step 
is to add a cumulative frequency column to the frequency distribu- 
tion. See the column headed cf in the illustration in Table XIX. 
This is done by totalling the frequencies, interval by interval from 
the bottom up. The median is by definition the middle score in the 
distribution, and is located by the formula N -f- i if N is an odd 

2 

number. (If N is an even number, then the median is computed 
as half way between the two central scores; i.e., if N is 44, the 
median would be half way between the twenty-second and twenty- 
third scores.) In this case N is 43, therefore the median is 
43 -f-i = 22. It is noted immediately that the twenty-second score 
2 

falls in the interval 63-67. The next step is to subtract from 22 
the number in the cumulative frequency column just below (20). 
Then, divide this difference (2) by the frequency in this interval 
(8). Multiply this quotient (2 = .25) by the size of the interval (5). 

8 

Add this product (1 .25) to the lower limit of the interval (63 -f- 1 .25 
is 64.25). The median then is 64.25, or in round numbers 64. 

The effect of the one extremely low score on the mean is now 
apparent when a comparison is made between the mean and the 
median. The mean is 60.45 an d the median 64.25. In a normal dis- 
tribution, the mean and the median would coincide. The one 
low score does not affect the median any more in its actual loca- 
tion in the distribution than it would have if it had varied several 
intervals in either direction from its actual location. The me- 
dian, in this case, gives a better description of the typical indi- 
vidual’s score. The mean, because of its use in other computations, 
is important. 

MEASURES OF DISTRIBUTION 

When the mean or median has been determined as a 
measure of central tendency it tells only a single value which is to be 
used to represent the total distribution. This measure does not tell 
anything about how much the scores deviate from it or with what 
frequency they approach it. For example, if a group of sixth grade 



Simple Statistical Procedures 2*1 

boys has taken the standing broad jump and you know that the 
average jump was 66 inches you might assume that many scored 
near that point. However, you do not really know about the per- 
formance of the total group until you have some measure of the 
distribution or spread. Or in the previous example of the badmin- 
ton clear test, if you know only that the mean is 60.45 you are 
entirely unaware of the great variability of scores presented in that 
distribution. 

The distribution around the median is usually interpreted by 
use of the quartile (Q), and that with the mean by use of the stand- 
ard deviation (S.D. or <j). The quartile simply splits each half desig- 
nated by the median. The first and third quartile points are located 
by exactly the same process as the median was. 

The standard deviation is computed from a grouped frequency 
distribution as in Table XIX. Add to that table another column 
marked fd 2 . The values in this column are obtained by multiply- 
ing those in the fd column by those in d. The computation then 
proceeds as in Table XX. 


TABLE XX 

Computation of Standard Deviation for Frequency Distribution 
on Badminton Clear Test 



f 

d 

fd 

fd * 


83-87 

3 

5 

»5 

75 

vfd 2 = 542 

78-82 

73-77 

5 

3 

4 

3 

20 

9 

80 

27 

= i2 .6 

68-72 

4 

2 

8 

16 

N 43 

63-67 

58-62 

8 

6 

1 

O 

8 

8 

c = .09 (computed in Table XIX) 

53-57 

3 

— 1 

- 3 

3 

c 2 — .008 

48-52 

2 

— 2 

- 4 

8 

S.D. = IZI d 2 _ c * Xsize of 

43-47 

3 

- 3 

- 9 

27 

38-42 

1 

- 4 

- 4 

16 

\ N step-interval 

33-37 

1 

- 5 

- 5 

*5 


28-32 

2 

- 6 

— 12 

72 

= y/ 12.6 — .008 «*■» 

« 3*7 

O 

- 7 


64 


18-22 

1 

- 8 

- 8 

>/ 12.592 = 3 - 6 

iS-17 

8-12 

O 

O 

o> 0 

rt 

1 1 



3.6 X size of step-interval — 

3 - 7 

1 

— 11 

— 11 

121 

3.6 X 5 — l8 - 





54 » 




222 Better Teaching Through Testing 


The fd 2 column is added (Efd 2 = 542) then divided by N 


/2fd 2 
' N 


12.6J. The correction, computed for the arbitrary refer- 


ence point in obtaining the mean, is squared and then subtracted 
from the last quotient (12.6 — .008 = 12.592). The square root of 
this number is then determined to complete the last step in the 


formula (S.D. 


4 


2fd 2 

~w 


c 2 == 3.6). This gives the standard 


deviation in terms of step-intervals. To interpret in terms of raw 
scores multiply by the size of the step-interval (3.6 X 5 = 18.0). 
. See Figure 4, p. 27 for the illustration of percentages within 
each S.D. distance in the distribution. The S.D. is small if the 
total range is small and the cases cluster closely around the mean. 
The S.D. becomes larger as the range increases and the cases 
spread. The proportion of cases between any two S.D. points in 
the distribution remain the same as long as the curve maintains 
characteristics resembling the normal curve. (See Figure 4) If the 
distribution becomes too dissimilar to the normal curve the S.D. 
should not be used. 

The percentile rankings of scores may be computed in connec- 
tion with either the median or the mean. However, the percentiles 
are usually used for interpretation of raw scores, particularly for 
comparing abilities measured by different tests. The scores range 
from zero to the 100th percentile. A score having a percentile 
ranking of 34 is better than 34 percent of the cases and poorer 
than 66 percent of them. (See any statistics text for the technique 
of computation.) 


CONVERSION OF RAW SCORES INTO 
COMPARABLE VALUES 

The T-scale can be very easily constructed on raw 
scores collected on any class or group of similar classes. There 
should be a minimum of fifty students represented in the scale, 
preferably a hundred or more. For this reason, if there are two 
or three sections of the same age group working on the same 
activity, it is better to combine die scores from all classes into a 
single distribution, rather than using a separate one for each sec- 



Simple Statistical Procedures 333 

tion. This combination puts the standards represented by the scale 
on a school basis rather than entirely on their own particular 
group. 

The T-scales presented in previous chapters are based on several 
sections within the same school or on similar classes for several 
schools. The number of subjects is always noted and usually the 
source of the subjects. 

The T-scale is based on the characteristic distribution of cases 
in a normal curve or in one which approximates that curve. The 
teacher or person who constructs the scale should understand some- 
thing about that curve, viz., the two ends are symmetrical, the 
scores concentrate heavily at the center, the two extremes are at 
approximately equal distance from the center and the frequencies 
near the extremes fall off very rapidly. See Figure 4 for the frequen- 
cies within each standard deviation. The T-scale is bas?d directly on 
the standard deviation. The middle score is arbitrarily assigned a 
T-score of 50 and since the mean is at the center of the curve the 
two are identical. The normal curve extends three standard devia- 
tions in each direction. By assigning 10 T-scc es to each standard 
deviation the scale then must range from approximately 20 to 80. 
Since the scores are massed near the center there will be many who 
receive a score in the 40’s or 5o’s. There will be considerably fewer 
who receive 30’s or 6o’s, and comparatively few who receive 20’s 
and 70’s. 

Such a scale is easily explained to students even of high school 
age. They are familiar with the concept of the average. They will 
understand the following explanation. Scores near fifty are about 
average, those considerably above or below 50 represent good or 
poor performance respectively. In addition to enabling the stu- 
dent to understand his relative ability on the single test he may 
also make a similar comparison of his performance on several tests. 

From the standpoint of the teacher or test administrator it is 
possible to add or average T-scores for composite ratings. This is 
a short cut to a score on a test battery, and is sometimes the only 
means of obtaining a single score on a series of tests. 



«*4 Better Teaching Through Testing 

TABLE XXI 

Computation of a T-Scale for goo Cases on a Motor Ability Battery 


Step-interval 

Tallies 

■ 

i 

M 

t+K 

% 

T-Beort 

149-up 

/ 

l 


m 

■ 

.250 

78 

144-144 

1 

mm 



1.5 

.760 

74 

142-143 

/ 

mm 


.5 


1.250 

72 

140-151 

// 

2 

3 

1.0 


2.000 

71 

140-146 

/// 

3 

5 

1.5 


3.250 

6S 

149-147 

//// 

4 

8 

2.0 

10.0 

6.000 

66 

144-144 

/// 

3 

12 

m 

13.5 

6.760 

66 

142-143 

/// 

3 

15 

WM 

16.5 

8.260 

64 

140-141 

usmmm 

5 

18 

2.5 

20.5 

10.260 

63 

138-136 

1 

fil 

23 

2.5 

25.5 

|H|g§gg|| 

61 

136-137 

H-H U 

mm 

28 

3.5 

31.5 

16.750 

60 

134-134 

H-H / 

9 

35 

3.0 ! 

38.0 

19.000 

59 

132-133 j 

i H-H ■ /// 

8 

41 

4.0 ! 

45.0 

22.500 

58 

130-131 

■HHHH 

10 

48 

5.0 

54.0 

27.000 

56 

128-126 

H-Hi HHI/II 

14 

59 


66.0 

33.000 

54 

128-127 

■HHHH III 

13 

73 

6.5 

76.5 

39.700 

53 

124-135 

■H-H- -H-H- )W / 

19 

86 

8.0 

84.0 

47.000 

51 

122-123 

HHHH // 

12 

102 

80 

108.0 

64.000 

48 

120-121 

H-HH-H 

10 

X14 

5.0 

119.0 

69.500 

48 

11&-119 

.-H-H- /!!/ 

9 

124 

4.5 

128.5 

64.250 

40 

116-117 

till -Jill 

7 i If 1117 

10 

133 

6.0 

138.0 

69.000 

46 

114-114 

H-H-HH- / 

11 

143 

4.5 

148.5 

74.250 

43 

112-113 

■H-H 1111 

8 

154 

4.5 

158.5 

79.250 

42 

110-111 

H-H /// 

8 

163 

4.0 

167.0 

83.500 

40 

109-106 

H-H II 

7 

171 

3.5 

174.6 

87.250 

39 

109-107 

H-H 11 

7 

178 

3.5 

181.5 

80.750 

37 

104-105 

H-H 

6 

185 

2.5 

187.5 

63.760 

34 

102-103 

/// 

3 

190 

1.5 

181.5 

65.750 

33 

100-101 

// 

2 

183 

1.0 

184.0 

87.000 

31 

68- 66 


ICS: 

185 

1.0 

186.0 

88.000 

28 

89- 87 

/ 

0! 

167 

.5 

187.4 

68.760 

28 

84- 65 

/ 

IS 

wan 

.5 

168.5 

88.250 

20 

83-down 


n 

188 

.4 

188.5 

88.750 

22 


N -200 









































































































































































































225 


Simple Statistical Procedures 

The steps in construction of a T-scale follow: 

»• Make a frequency distribution of scores to be used. (This 
is the same procedure as outlined above except that in this case 
the size of the step-interval must be small. If intervals are too large 
there will be too much massing of raw scores and the T-scale will 
be broken rather than continuous. In other words, the scale will 
not discriminate between performance of different persons with 
similar scores.) 

2. Total the frequencies in the column marked f. 

3. Count the total of all frequencies above each interval and 
put the total in column t. For example, there is nothing above the 
top interval, therefore, the t column reads o in the first interval; 
in the successive intervals of t, the value will be the sum of / and t 
of the interval above. As a check on accuracy, the sum of / and t of 
the last interval always should be equal to N. 

4. Divide / values for each interval by 2. Therefore, this column 
is called the i/ 2 column. 

5. Add the t and \/ 2 columns for each interval; label the column 
accordingly t -f- \/ 2 . (The purpose of this step is to find the num- 
ber of frequencies above the midpoint of each interval. This is 
done because the midpoint of the interval is always considered as 
representative of all cases in the interval. This is another reason 
for keeping the size of the step-interval small.) 

6. Divide the t -)- i/ 2 column by N and multiply by 100. This 
column is called the % column. (Carry the percentages to at least 
three decimal points.) 

7. Read in Table XXII the standard deviation value corre- 
sponding to each percentage value. This is the T-score and is in- 
serted in a column so labelled. (Use the T-score to the nearest 
whole number rather than in decimals.) 

8. Eliminate the computation leaving only the values for the 
step-interval and the corresponding T-score. If this is entirely for 
your own use the quickest way is to simply fold the sheet over 
so that the T-score column is beside the step-interval column. If 
you wish a more permanent form or wish to post it, copy only 
those two columns on another sheet, perhaps including T-scales 
fox' several tests on the same sheet for greater convenience. (See 
p. 40.) 



226 Better Teaching Through Testing 

TABLE XXII 

Conversion of Percentages into T-Scores 


T- 

Scores • 

Per 

cent 

T- 

Scores 

Per 

cent 

r- 

Scores 

Per 

cent 

r- 

Scores 

Per 

cent 

0 

99.999971 

*5 

99-38 

5° 

50.00 

75 

0.62 

o. 5 

99999963 

*5-5 

99-*9 

5°-5 

48.01 

75-5 

0.54 

1 

99-999952 

26 

99-i8 

5* 

46.02 

76 

0.47 

>•5 

99-999938 

26.5 

99.06 

5i-5 

44.04 

76-5 

O.4O 

2 

99-99992 

27 

98-93 

5* 

42-07 

77 

o-35 

2-5 

99-9999° 

*7-5 

98.78 

5*-5 

40.13 

77-5 

0.30 

S 

9999987 

28 

98.61 

53 

38-21 

78 

0.26 

3-5 

99 99983 

*8.5 

9842 

53-5 

36-32 

78-5 

0.22 

4 

99-99979 

*9 

98.21 

54 

34-46 

79 

0.19 

4-5 

99-99973 

29-5 

9798 

54-5 

32.64 

79-5 

0.16 

5 

99-99966 

3° 

97.72 

55 

30.85 

80 

0.13 

5-5 

99-99957 

3°-5 

97-44 

55-5 

29.12 

80.5 

0.11 

6 

99-99946 

3i 

97-13 

56 

*7-43 

81 

0.097 

6.5 

99-99932 

Si-5 

96.78 

56-5 

25.78 

81.5 

0.082 

7 

99-99915 

32 

96.41 

57 

24.20 

82 

0.069 

7-5 

999989 

3*-5 

95-99 

57-5 

22.66 

82.5 

0.058 

8 

99-9987 

33 

95-54 

58 

21-19 

83 

0.048 

8-5 

999983 

33-5 

95-°5 

58-5 

•9-77 

83-5 

O.040 

9 

99-9979 

34 

94-5* 

59 

18.41 

84 

0.034 

9-5 

99-9974 

34-5 

93-94 

59-5 

i7.Il 

84.5 

0.028 

10 

99-9968 

35 

93-3* 

60 

15.87 

85 

0.023 

io-5 

99-9961 

35-5 

92-65 

60.5 

14-69 

«5-5 

0.019 

11 

99-9952 

36 

9i-9* 

6l 

>3-57 

86 

0.016 

1 1 -5 

99 994* 

36-5 

91-15 

61.5 

1251 

86.5 

02113 

12 

99-99*8 

37 

90-3* 

62 

11.51 

87 

0.011 

'*•5 

99-991* 

37-5 

89.44 

62.5 

10.56 

87-5 

0.009 

13 

99989 

38 

88.49 

63 

9.68 

88 

0.007 

i3-5 

99-987 

38.5 

87-49 

63-5 

8.85 

88.5 

0.0059 

•4 

99984 

39 

86.43 

64 

8.08 

89 

02)048 

>4-5 

99 98i 

39-5 

85-31 

64.5 

7-35 

89-5 

0.0039 

«5 

99-977 

40 

84.13 

65 

6.68 

9° 

0.0032 

'5-5 

99-97* 

40.5 

82. 8g 

6.5-5 

6.06 

9°-5 

0.0026 

16 

99-966 

4* 

81.59 

66 

5.48 

91 

0.0021 

*6-5 

99-96° 

41-5 

80.23 

66.5 

4-95 

9i-5 

0.0017 

>7 

99-95* 

4* 

78.81 

67 

4.46 

92 

0.0013 

>7-5 

9994* 

4*-5 

77-34 

67-5 

4.01 

92-5 

0.0011 

18 

99931 

43 

75.80 

68 

3-59 

93 

O.OOOg 

i8. 5 

99-9*8 

43-5 

74-2* 

68.5 

3-22 

93-5 

0.0007 

19 

99-9°3 

44 

7*-57 

69 

2.87 

94 

0.0005 

19-5 

99-886 

44-5 

70.88 

69-5 

2.56 

94-5 

O.OOO43 

20 

99-865 

45 

69-15 

70 

2.28 ! 

95 

O.OOO34 

so.5 

99-84 

45-5 

67.36 

7°-5 

2.02 

95-5 

0.00027 

21 

99-8i 

46 

65-54 

7i 

i-79 

96 

0.00021 

21- 5 

99-78 

46-5 

63.68 

7i-5 

1.58 

96-5 

0.00017 

22 

99-74 

47 

61.79 

72 

1-39 

97 

0.00013 

2*-5 

99-7 0 

47-5 

59-87 

72-5 

1.22 

97-5 

0.00010 

23 

99-65 

48 

57-93 

73 

1.07 

98 

0.00008 

*3-5 

99.60 

48.5 

55-96 

73-5 

0.94 

98.5 

0.000062 

*4 

99-53 

49 

53-98 

74 

0.82 

99 

02100048 

24.5 

99-46 

49-5 

51-99 

74-5 

O.71 

99-5 

100 

r* at 

6 d 


* T-scores are S.D. values. 











Simple Statistical Procedures 


ass 7 

THE EVALUATION OF KNOWLEDGE TEST 
QUESTIONS 

A simple method for discovering an approximate 
estimate of the value of multiple choice questions is described 
here. 

First, score the papers. See the method, described in Chapter 3, 
of superimposing on the answer sheets an especially prepared 
key, with holes punched where the correct answers should appear. 
The total of questions answered correctly should be recorded on 
the answer sheet. Second, arrange the answer sheels according to 
scores. Third, transfer the data to tabulation sheets, prepared for 
this purpose. See Figure 42 for a sheet prepared to accommodate 
data on five questions. Much teacher time can be saved if these 
sheets, or similar ones of your own design, can be duplicated or 
mimeographed for your use. They can be used for all examinations 
of multiple choice type. For example, a portion of the sheet is 
illustrated below. The record of a student with a total score of 30 
would be recorded in the step-interval of 30-3 ’ If he selected the 
correct response for question 17, a tally mark would be placed 
in the column headed “R,” or right responses. The record of a 
student with a total score of 2g would be entered in the step-inter- 
val 28-29. If 3 is the correct response and he selected 4, his choice 
would be recorded in the column headed W.O, indicating 
wrong responses or omissions. Note that the correct response is 
shown in brackets at the bottom of the column, and the question 
number is placed, at the top of the column. The record of a per- 
son with a total score of 22 who omitted the question is also 
shown in the illustration. 


R 

»7 

W ,0 

/ 

30 -S 1 



28-29 

4 


26-27 


24-25 


22-23 

O 


( 3 ) 



228 


Better Teaching Through Testing 


R W.0 

BHE3 

■B 

mm 

mm 

64-65 

64-66 

64-65 

64-66 

64-66 

62-63 

62-63 

62-63 

82-63 

62-63 

80-01 

60-61 

60-61 

60-61 

60-61 

58-66 

68-69 

68-59 

68-69 

68-69 

66-67 

66-57 

66-57 

66-57 

66-67 

54-65 

64-65 

64-55 

64-55 

64-65 

52-53 

62-53 

52-53 

52-63 

62-63 

50-51 

50-5X 

50-51 

50-51 

60-61 

48-46 

48-40 

48-49 

48-49 

48-49 

48-47 

46-47 

46-47 

46-47 

46-47 

44-46 

44-46 

44-46 

44-46 

44-45 

42-43 

42-43 

42-43 

42-43 

42-43 

40-41 

40-41 

40-41 

4041 

4041 

38-39 

38-39 

38-39 

38-39 

38-39 

36-37 

36-37 

36-37 

36-37 

36-37 

34-35 

34-35 

34-36 

34-85 

34-35 

32-33 

32-33 

32-33 

32-33 

32-33 

30-31 

30-31 

30-31 

30-31 

80-31 

28-29 

28-29 

28-29 

28-29 

28-29 

28-27 

26-27 

26-27 

26-27 

26-27 

24-25 

24-25 

24-25 

24-26 

24-26 

22-23 

22-23 

; 22-23 

22-23 

22-23 

20-21 

20-21 

20-21 

20-21 

20-21 

18-19 

18-19 

18-10 

18-19 

18-19 

16-17 

16-17 

16-17 

16-17 

16-17 

14-16 

14-15 

14-16 

14-15 

14-18 

12-18 

12-13 

12-13 

12-13 

12-13 

10-11 

10-11 

10-11 

10-11 

10-11 

B- 9 

8- 9 

8- 9 

8- 9 

8- 9 

6- 7 

6- 7 

6- 7 

6- 7 

6- 7 

4- 5 

4- 5 

4- 6 

4- 8 

4- 5 

2- 3 

2- 3 

2- 3 

2- 3 

2- 8 

0- 1 

0- 1 

0- 1 

0- 1 

0- 1 


Figure 4a. Sample Tabulation Sheet 




























































































































Simple Statistical Procedures sag 

Since the tabulation sheet has room for five questions, it will 
save time if the recorder marks the answer sheets with a ruled 
line at the end of each five questions and records the answers to the 
first five questions before going on to the next five. The process 
of transferring the data to the tabulation sheets can be done fairly 
rapidly. 

The number who succeeded on a question is readily obtained 
by counting the tally marks in the column headed “R”; and the 
number failing by counting each score appearing in the "W, 0 ” 
column as one. 


TABLE XXIII 

Percentage Minimums for Functioning of Items 


Table for 3 % 

Table for 4 % 

Table for 5 % 

N 

Minimum 

N 

Minimum 

N 

Minimum 

Below 50 

1 

Below 38 

1 

Below 30 

1 

5 88 3 

3 

38-62 

2 

30-49 

2 

84-116 

3 

63-87 

3 

50-69 

3 

117-150 

4 

88-1 12 

4 

70-89 

4 

151-183 

5 

“ 3-137 

5 

90-109 

5 

184-216 

6 

138-162 

6 

110-129 

6 



163-187 

7 

130-149 

7 



188-212 

8 

150-169 

8 





170-189 

9 





190-209 

10 


One of the first things the teacher wants to know is how difficult 
each question proved to be for the group. This is known as the 
difficulty rating of a question, and it is obtained by dividing the 
number of errors and omissions by the number who took the test. 
Thus, a question with a difficulty rating of 40 is less difficult than 
a question with a difficulty rating of 45. In other words, the higher 
the difficulty rating the more difficult the question. Teachers will 
have to decide for themselves what limits they will use, but they 
certainly would not want to retain many questions with a diffi- 
culty rating above 90 or below 10. 

The teacher will want to know how the responses functioned in 
each question. He will not want to retain responses which were 




230 Better Teaching Through Testing 

not selected by any of the students taking the test. Sometimes it 
is advisable to set a limit or minimum, such as three percent, and 
decide to drop from further use any response not selected by at 
least three percent of the total number of persons taking the test. 
Thus, if the total number taking the test, or N, was 100, a re- 
sponse would have to be selected by at least three persons while 
if N was 75, the minimum would be two persons. 

If fewer than the three responses in a question function, as arbi- 
trarily defined by a choice of percent, then the question should 
be dropped or revised as it is no longer a multiple choice question. 

DIFFICULTY RATING AND ITEM ANALYSIS 

As an example of the difficulty rating and item anal- 
ysis, take the question described earlier on trapping in soccer 
(Chapter 8, p. 186). When this question was given in a soccer 
knowledge test to forty-eight players, four chose part one. A glance 
at the table above will show you that this was an adequate number 
on any of the percentage levels listed. Eight chose part two; seven- 
teen chose the correct response, part three. Twelve selected part 
four, while four selected part five. Three of the forty-eight omitted 
the question. Therefore, if you were using any of the percentage 
minimums for functioning of items listed above, you would re- 
tain all five parts. The difficulty rating, obtained by dividing the 
total errors and omissions (4 + 8 + 124-4 + 3 = 31) by N (48), 
is 65. 

The worth of an item depends not only on its difficulty and the 
functioning of the responses, but upon its desirability for inclu- 
sion in the test as a whole, and upon its power to discriminate 
between students of high and low levels of general achievement in 
the subject involved. A question may be said to have perfect dis- 
criminating power when every student who answers the question 
correctly ranks higher in the scale than all students who answer 
it incorrectly. A question in which more students of low ability suc- 
ceed than students of high ability is said to have minus discrimi- 
nating power, and certainly is a poor question. Between the ex- 
tremes of perfect and minus discriminating powers, questions of 
all degrees of discrimination are found. An inspection pf the item 



Simple Statistical Procedures agi 

analysis for each question, if tabulated on a form similar to the 
one described here, will give you an estimate of how well the 
question discriminates between the various levels of ability. 

INDEX OF DISCRIMINATION 

According to Lindquist and Cook® evaluation of 
the degree of effectiveness of the various items must be based on 
a single quantitative measure which can be conveniently com- 
puted and easily compared for many items. Such a measure is 
referred to as the index of discrimination , i.e., the index of the 
effectiveness with which the item discriminates between indi- 
viduals of different levels of information or ability. Various in- 
dices of discrimination have been studied and compared by 
research workers but no one index is infallible for all situations or 
for all groups of students, or levels of ability. Lindquist and Cook 
expressed a belief that the problem of securing such a measure was 
not solved. Likewise, Long and Sandiford, 1 while discussing the 
same problem, stated that the better indices differ so little in effec- 
tiveness that the selection might justifiably be made on the basis of 
ease of computation. An index of discrimination that has proven 
satisfactory for the authors is the one recommended by Swineford 10 
for a heterogenous group of subjects. It is the difference between 
the mean total score of those persons succeeding on the item and 
of those persons failing on the item, or as expressed in formula 
form, M r — M w . It is easy to calculate, using the tabulation 
sheets, and since it involves determining the mean rather than the 
median, it is based on all the data secured. It is easier to use the 
median, since the median is more readily calculated than the mean, 
and apparently the results are quite comparable. For example, in 
one examination containing 86 questions an index was computed 
for each question using the mean, and another index using the me- 
dian in the same formula. In this examination, 81 percent of the 
questions received the same evaluation, satisfactory or unsatisfac- 
tory, by the two forms of the index; 14 percent were rejected by 
use of the median and not with the mean; 5 percent were accepted 
by use of the median and not with the mean. Similar results have 
been found on other examinations. The choice then would seem 



232 Better Teaching Through Testing 

to be primarily one of preference by the teacher, with a slight 
advantage in favor of the median because of economy of time. 

The minimum size of an acceptable index of discrimination 
must be determined. Any question with an index of discrimina- 
tion of less than two-thirds the size of one standard deviation 
should be subjected to further study before it is decided to retain 
or drop it. The final decision on such borderline questions should 
be made on the basis of its importance in the content distribution 
and on its difficulty rating. 

ESTABLISHMENT OF CRITERION SCORES 
THROUGH CONDUCT OF RATINGS 

In conducting ratings which are to be used as a cri- 
terion for evaluating tests, it is necessary that the ability being 
rated be carefully defined. If it is playing ability in general, then 
the raters need to discuss what they consider excellent playing 
ability, what good, what average, and so on through the range. 
The size of the range depends upon the amount of discrimination 
that is desired. The judges should either be “experts” in their 
knowledge of the activity, or carefully trained students. The mini- 
mum number recommended is three. After they have met and dis- 
cussed the various points on the scale, they should have time to 
become familiar with the chart that they are to use before the 
actual rating begins. The code for marking should be placed on 
the chart. The rating form may be subdivided into skills, such as 
(for volley ball): serve, set-up, pass, volley, and recovery from 
the net. 

The players must be identified in some manner. Colored pinnies 
with numbers on both front and back are helpful. 

The raters should work independently. The length of time that 
they watch the players will vary with the activity, but they should 
see each player active for a long enough time to be able to rank him 
in each of the units, and to give him a composite score. 

After the judges have completed ratings and have given a score 
to each player, the scores should be totalled (see p. 238 for reason). 
Sometimes these are weighted by giving extra value to one judge’s 
opinion, as for example the instructor of the class, who has seen 



Simple Statistical Procedures *33 

the group for a longer period of time. In this case, his ratings 
might be multiplied by a constant such as two. 

Agreement of judges can be determined by the correlation tech- 
nique. If the judges are well trained and the length of time for 
observation is sufficient, the coefficient should be high (at least .80). 

PROCEDURE FOR CONSTRUCTING 
A MOTOR TEST BATTERY 

The steps in construction of a motor test battery are 
essentially the same for all types. These steps will be outlined here 
and an illustration of the development of a sports battery will be 
given. It should be noted that it is a combination of logical plan- 
ning and statistical analysis. 

1 . Study the Problem or Need for the Test 

What tests are available which are purported to serve this 
particular purpose? What is the statistical evidence of their value? 
Are they practical to use in the space and tine available? Are they 
designed for the age and skill of the group for which they are to 
be used? Are the standards suitable for this group? If the answers 
to most of these questions are unsatisfactory, then work on a bat- 
tery would be indicated. 

a. Analyze the Ability to be Measured 

List the principal skills involved; rate their importance for 
success in the activity. If volley ball tests were under consideration, 
the analysis might develop as follows: 

1. Skills involved 

1 . Receiving the ball 

a. from service 

b. from across the net 

c. from teammate’s set-up 

s. Playing the ball 

a. service 

b. pass to teammate 

c. volley across the net 

d. spike across the net 

e. set-up to teammate 




234 Better Teaching Through Testing 


3. Footwork 

a. avoiding foul on service xx 

b. following into court after service xx 

c. getting under the ball x 

d. filling opening left by teammate x 

e. avoiding foul at center line x 

f. jumping x 


The importance of a skill is rated according to the frequency 
with which it occurs in the game, the frequency with which it 
presents a problem to the player, and its relative significance for 
successful playing. From this chart it will be noted that the most 
important aspects of the game for most players are to be able to put 
the ball in play; to receive it, i.e., keep it in play and not be re- 
sponsible for the ball becoming dead; and to play the ball either 
to a teammate or across the net. The service is rated as slightly 
less important among these skills because the player must share 
the opportunities for service with all the teammates. On the other 
hand, theoretically he may be expected to play on every point 
served by the opponents as well as his own team; actually he does 
not play that often, the frequency varying with his position on the 
floor and the style of the game. 

The footwork seems to be very closely related to receiving 
and playing the ball, and of little significance in itself. Its relation- 
ship to service seems most important for it is often on failure in 
this respect that a point is lost. Opportunities for other items listed 
in footwork occur infrequently or are really a part of some other 
skill; therefore, they are ranked low. The other items of low rating 
occur only occasionally, or for a few players on the team rather 
than for the team as a whole. 

3. Select the Experimental Items 

The previous study of available tests may have yielded some 
of value, or have suggested forms which may be adapted easily. 
Many times there may be little available in a desirable form for 
the important aspects selected in step 2. In this case one’s inge- 
nuity constitutes the chief source of ideas. It is usually advisable to 
try out these ideas on small groups to be sure that they will work, 
that the dimensions and markings are satisfactory, and the scoring 
scheme feasible. This should lead one to a written description of 



Simple Statistical Procedures agij 

the test, the instructions to be given the students, number of trials 
to be used, and plans for administering and recording results. 

In the example of volley ball the test to be selected or de- 
vised would deal principally with service, receiving and volleying 
the ball. More than one service test might be tried in an experimen- 
tal stage. They might differ on the basis of different theories of a 
good service. In one case the players may be coached to serve into 
the comers or at least around the edges of the court with little 
consideration for the flight of the ball. Then the test would call 
for a net, floor markings on one side of the court giving higher 
values to areas near the boundary lines than to that in the 
center of the court. On the other hand the players may be coached 
to play balls into the corners with a fast ball traveling in a flattened 
arc. Then the test would call for a rope somewhere above the net 
with reduced scoring points for each ball going above the rope. 

Likewise, teams may be taught to use a variety of placement on 
service, or to serve into weak spots on the opponents’ team. Such 
a situation might call for a very different target area on the floor. 

In the experimental stages, much lime will be saved if a single 
test can be given, and the results recorded in such a way as to 
permit scoring in different ways to represent different tests. It 
is essential in this case to be definite on the point of aim. For 
example, if the area A (Figure 43) just in front of the baseline is 
considered the most desirable area for service then all students 
aim for that. By use of a scoring chart showing the actual court, 
each serve may be marked for its exact point of contact. It is 
then possible to score the test in as many ways as seem desirable. 
The shaded area may be subdivided into corners and center (b) or 
left as an undivided area (a); the space in front may be divided 
into zones parallel to A (c), or subdivided into zones by lines at 
right angles to A (d); the space behind A (e) may be scored as 
zero, or the whole area scored a little less than A, or subdivided 
into zones the same as those in front of A (f). Each player would 
have a score sheet in the form of a court outline. The scorer 
records the number of the trial as indicated in (a). It is then pos- 
sible to construct new markings on the chart and score in any 
way desired. 

The scoring plan for the badminton serve test (p. 49) was 




Simple Statistical Procedures g$y 

devised in this way. The test was designed to test ability for a 
short service into the front left corner of the right service court. 
The scoring chart was identical with the floor markings. The stu- 
dent knew he was aiming for the corner of the court but knew 
nothing of how it was to be scored. Each service was recorded on 
the chart. A series of scoring schemes was tried with the eventual 
selection of the one given in Chapter 4 since it correlated the 
most highly with the criterion. 

The volley ball tests to be used in this illustrative discussion 
will be called: 

1. Service test #1 

2. Service test #2 

3. Volleying test #1 

4. Volleying test #2 

5. Set-ups 

6. Net recovery 

The details of each test would be determined at this stage. They 
are omitted here for the sake of brevity. 

4. Select and Obtain the Criterion 

Every test must be compared with some criterion as a yard- 
stick of its efficacy. In rare instances the study of available tests 
may have shown some long or complicated measure which is other- 
wise acceptable. The purpose now may be to devise a short form 
which will give similar results with greater economy of time and 
effort. Under- these circumstances the long form may be used as 
the criterion and administered to each subject. The main value of 
this is that the criterion is more or less objective and of known 
worth. 

However, it more frequently happens that available measures 
are inadequate for such use. The best alternative, then, is to secure 
subjective ratings which are as good as possible. This is usually 
done by having at least three judges or experts rate each subject. 
(See p. 232 for rating procedures.) It is essential that these judges see 
the subjects in action in the activity, not in tests, though the ratings 
should be done at a time not too remote from that in which the 
tests are given. The scale to be used usually contains five categories, 



238 Better Teaching Through Testing 

each carefully defined. The judges should be instructed to use all 
five categories and not just the middle three, as frequently happens. 
However, knowing the nature of the normal curve it is to be ex- 
pected that there will be comparatively few cases in the first and 
fifth categories and the greatest number in the middle one. 

The various judges each give an independent rating to each 
subject. In utilizing these ratings it is better to use the sum of 
the judges’ ratings than the average as the criterion. This gives 
equal weighting to every judge’s opinion, but this would also be 
true if the average were used. The sum also reduces the mathe- 
matical steps by one, and thereby saves times and eliminates an 
opportunity for error. None of these, however, constitutes the 
really important reason for using the sum rather than the mean 
or median. If the median is used, all subjects will fall within the 
five point range; the same is true with the mean unless fractional 
values are used. When the sum is used, the range is greater. On a 
five point scale with three judges, the range is 3 to 15, with four 
judges, 4 to 20. This gives a range more nearly comparable to 
the test score range for computation of the correlation. 

It must be kept clearly in mind that the work connected with 
establishment of a criterion measure is necessary only in a project 
to evaluate or develop a new or comparatively unknown battery. 
It is not part of a regular testing program. 

The agreement obtained between the judges’ ratings will de- 
pend upon the comparative skill and knowledge of the judges, 
upon the clearness of the instructions on aspects to be rated, 
and upon the length of time available for the judges to observe 
the players. Most studies yield a coefficient between .70 and .85 
on agreement of judges. 

In the volley ball example, the criterion could be of either form 
suggested here. There are volley ball batteries available which 
could be given as the basis for comparing new tests. Or, the judges’ 
ratings could be made, while the various teams played. In this ex- 
ample, let us use a rating by three carefully selected judges. Each 
is provided with a detailed description of a five point rating scale. 
Each sees the players on three successive days of playing. 

The agreement between judges as determined by a correlation 
of their ratings could be as follows: 



Simple Statistical Procedures 


*39 


Judges 1 and a 79 

Judges 1 and 3 7* 

Judges 2 and 3 80 

These would be considered satisfactory. The sum of the three 
judges ratings is now used for the player’s score on playing ability, 
which is the criterion for evaluating the experimental items. 

5. Select the Subjects to Be Used 

The subjects used in the development of the battery should 
be representative of the group on which you eventually wish to 
use the tests. That means that validation on a college group would 
indicate satisfactory use on a similar college group, but not on a 
high school group unless a similar project also proved its worth 
with the younger age. The experience of the group must be con- 
sidered, also. For example, different results may be obtained on 
beginning, and advanced groups of like age. The results may be 
similar enough that the battery may be used for both levels but 
it is best to keep the two groups separate for statistical analysis. 

There is no magical number which can be given as the one 
which will give satisfactory results. However, as a generalization 
it can be said that approximately 100 should be considered a 
minimum. The number increases with the variability of the skill 
demonstrated, the lowered reliability of the measures, and the 
confidence which you wish to place in the results. 

In the volley ball example, let us assume that the test is to be 
developed for beginning players in college. The subjects available 
include 130 cases from three classes. All will be used except the 
few who may be absent during the testing and rating periods. 

6. Determine the Reliability of the Experimental 

Test Items 

The ideal method of ascertaining the reliability of the sepa- 
rate items is to administer each in identical form on two successive 
days, and then to correlate the results. If the test is measuring 
consistently it should yield a very high coefficient (at least in the 
go’s); since this does not permit opportunity for practice between 
the two administrations, and it does not permit much change in 



« 4 ° 


Better Teaching Through Testing 


TABLE XXIV 

Reliability of Tests Used in the Illustrative Problem of 
Test Construction 



#«/ 

cases 

#0/ 

trials 

rfor 

repeated 

test 

Odd- 

even 

r 

Spearman- 

Brown 

r 

i. Service test #1 

13 ° 

10 


.81 

.90 

a. Service test #a 

130 

80 


•74 

.85 

3. Volleying test #1 

126 

10 

.84 



4. Volleying test #2 

,8 5 

10 


.50 

.67 

5. Set-up test 

180 

20 

.80 



6. Net recovery test 

184 

10 

•75 




Assuming that other groups and situations are similar and that 
the tests are given in identical form comparable results may be 
expected. 


TABLE XXV 

Coefficients Stepped Up by the Spearman-Brown Prophecy Formula as 
an Estimate for Twice the Number of Trials 


Odd- 

even 

Spearman- 

Brown 

Odd- 

even 

Spearman- 

Brown 

Odd- 

even 

Spearman- 

Brown 

.60* 

■75 

.70 

.8s 

.80 

.89 

.61 

.76 

• 7 * 

•83 

.81 

•90 

.62 

■77 

■ 7 * 

.84 

.88 

• 9 ° 

•63 

.78 

•73 

.84 

•83 

• 9 1 

.64 

.78 

•74 

.85 

.84 

• 9 1 

.65 

•79 

•75 

.86 

.85 

• 9 * 

.66 

.80 

.76 

.86 

.86 

•93 

.67 

.80 

•77 

.87 

.87 

•93 

.68 

.81 

.78 

.88 

.88 

•94 

•69 

.88 

•79 

.88 

•89* 

•94 


* For any values not included in this table compute by use of the formula on p. >41. 








Simple Statistical Procedures 2^1 

physical status or attitude. In general it is better than two repeti- 
tions on the same day because some will be affected more than 
others by fatigue, and it also gives opportunity for transitory fac- 
tors of interest, feelings of physical well-being, “jinx” or “off day” 
performances to enter in. In general these latter factors would 
tend to reduce the coefficient slightly, but these factors do influence 
results when used on student groups; therefore, they should be 
given opportunity to operate when evaluating the tests. 

It is not always possible to give two complete repetitions of the 
test, especially when it is long. In that case a randomly selected 
sample of about 100 subjects may be given the test twice and re- 
liability estimated from this group. In case of the longer tests, how- 
ever, it is probably better to simply correlate half the trials against 
the other half for each subject. This is known as the odd-even 
method. The sum of the scores on the odd numbered trials is corre- 
lated against the sum of the even numbered trials. (See Chap. 2, 
p. 20). This type of splitting is preferable to first and last halves 
as it tends to even out practice or fatigue effects. 

When the odd-even method is used it is permissible to use the 
Spearman-Brown Prophecy formula * to step up the coefficient to 
twice the length in the correlation since this is the actual length of 
the test to be used in the remaining statistical analysis and in later 
administration. (See Table XXV.) Also, if the reliability is found 
to be too low (below .80) the Spearman-Brown formula may be 
used to estimate the reliability for an increased number of trials. 
When the number of trials is determined for the desired reliability, 
the lengthened foTm may be given to all subjects. Or, if a longer 
form is impractical, the test is now discarded because of its low 
reliability. 

It will be seen then that the reliability of the test is partially a 
function of the length of the test. For that reason it is well to plan 
on several trials of any test where there is considerable variation 
in performance and any element of chance involved. This doubt- 




The Spearman-Brown formula is r x = 


Nr 

i + (N-l)r 


where 


r r — coefficient to be estimated 
N = 2 or proportion of increase in length 
r — correlation obtained on the halves 



242 Better Teaching Through Testing 

less explains the reason why beginners frequently require more 
trials than advanced players on the same test. 

It must be kept in mind that there is no such thing as the re- 
liability of a test. It is always for the group and under the circum- 
stances stated. In the volley ball illustration the reliability of these 
tests for the 130 beginning college students is presented in Table 
XXIV. 

7. Determine the Validity of the Test Items 

The reliability should always be computed first and all ex- 
perimental items with very low reliability can be dropped before 
attempting correlation with the criterion. This saves time, as those 
with low reliability invariably yield low validity coefficients and 
would be dropped for that reason. 

The validity of the test is determined by correlating the scores 
of each subject with his criterion score. If previous work has been 
well done and the experimental items are of the right type these 
coefficients will range from about .60 to .80 or .85. Occasionally, 
a single item will be found with a coefficient in the .8o’s and may 
be considered satisfactory alone. In general, tests of this level are 
not high enough for individual prediction or evaluation. For that 
reason it is usually necessary to combine two or three tests for a 
battery. Those with low validity coefficients are not considered 
further. 

In the volley ball illustration, test 4 has a reliability of only .50. 
It would not be practical or possible to increase its length to secure 
a sufficiently high reliability. Therefore, it will be dropped. 

The validity of a test is also in terms of the group and circum- 
stances under which it is established. Therefore, we will state the 
validity of these tests for the beginning college group of 130 to be 
as follows: 


1. Service test #1 

.780 

2. Service test #2 

.760 

3. Volleying test #1 

.821 

4. Set-up test 

■745 

5. Net recovery 

.600 



Simple Statistical Procedures 

The test on net recovery will be dropped because of low validity. 
The best single test is volleying, and if only one test can be used 
that is the best selection. 


8. Compute the Intercorrelation of the 
Experimental Items 


During these successive steps the number of experimental 
items is gradually reduced. The low reliability and low validity 
items have already been discarded. Likewise, it is not helpful to 
try to add measures which are highly related or measure the same 
thing. This necessitates correlating each item retained at this stage 
with every other item. 

In this example, the intercorrelations for the volley ball tests 
follow: 



1 

2 

3 

1. 

Service test #1 



2. 

Service test #2 .70 



3- 

Volleying test .25 

.40 


4- 

Set-up test .50 

■45 

.30 


As would be expected the two service tests correlate rather 
highly. The other tests show low but varying degrees of relation- 
ship. 


9. Combine Items and Obtain Multiple Correlation 
with the Criterion 

The logic for combining items should now be cleaT. Each 
item to go into a battery should have a relatively high validity 
coefficient, but it should have a minimum relationship to the other 
items in the battery. 

The multiple correlation may be computed by the usual method 
referred to in any text on statistics which covers advanced corre- 
lation technique. There is a short cut to that computation which 
consolidates some of the intermediate steps. A sample of the com- 
putational sheet to be used is shown in Table XXVI. 

The multiple correlation is a process of obtaining the best pos- 
sible combination with the criterion, and the degree of relationship 
between that combination and the criterion. Let us use the follow- 
ing symbols to simplify the discussion. 



244 Better Teaching Through Testing 

R— multiple correlation coefficient 

0— criterion 

1— service test #1 

2— service test #2 

3— volleying test 

4— set-up test 

You are now ready to decide upon combinations which might 
be plausible. The two service tests are too highly related to select 
both, though either is good alone. Service test #i and volleying 
have a low inter-correlation (.25) and, therefore, should be tried. 
By the same reasoning combinations with known decreases in R 
are as follows: 1-3, 2-3, 3-4, 2-4. Other combinations would yield 
an R too low to be of value. 

The computation of Ro.is is given in Table XXVI as an illustra- 
tion of the short method of computation.® The coefficients are as 
follows and confirm the prediction above: 

Ro.is — .970 
Rom = -945 
Ro.M = .896 

Ro .24 = -813 

These coefficients indicate that service test #1 and volleying con- 
stitute the best combination, although the battery with service 
test #2 is almost as good. However, it will be remembered that 
service test #2 has twice as many trials as #1 and would, therefore, 
be impractical to use when #1 is available and of known value. 
The combination of volleying and set-ups also yields a very high 
coefficient. If the set-ups test can be given in less space and time, it 
might be used instead of service test #1 and without question 
would be chosen instead of service test #2. Hence the basis for 
selection is not solely the size of the coefficient; good judgment and 
practical assessment must be applied. 

10. Compute the Regression Equations 

Having decided upon the battery to be used, the method of 
combining the tests must be prepared. The regression equation 
takes into account the variability of the raw scores on each test 



*45 


Simple Statistical Procedures 


TABLE XXVI 


Sample Doolittle Sheet with Computation op Multiple Correlation 
for Volley Ball Battery* 


0 — criterion 

1 — serve tost 

2 — volleying test 


Directions 

a 

b 

c 

d 

* 

/ 

9 

1 Insert valuee f or r's 

2 Divide line 1 by — 1 

1.000 

.000 

.250 

*12 

— .260 

r 13 

r 14 

'16 

r 16 

—.780 
—'01 
+ .780 

3 Insert values for r's 

4 Multiply items in line l»b to x, by bt. 

5 Add algebraically Lines 3 and 4 

S Divide line S by negative bi 

1.000 
— .083 

'23 

*24 

'26 

r 28 

— .821 
— *02 

4*. 105 

+ .937 





—.026 
+ .067 

7 Insert values for r’s 

8 Multiply items in Line 1, o to x, by ci. 
0 Multiply items in line 6, e to x, by ct . 

1 

1.000 

'34 

r 36 

r 3fl 

fifl 

10 Add algebraically Lines 7, 8, 9 

11 Divide Line 10 by negative c 10 






12 Insert values for r's 


1.000 

'45 

'48 

— r 04 

13 Multiply items in line 1, d to x, by dt 

14 Multiply items in Line 5* d to x, bv d« 

15 Multiply items in Line 10, d to x, by dn 

16 Add algebraically Linee 12, 13, 14 and IS 

17 Divide Line 10 by negative dn 





18 Insert valum for r's 


1.000 

wP 

, 

— *05 

10 Multiply items in Line 1, e to x, by et . , 

20 Multiply items in Line 5, e to x, by ei . . 

21 Multiply items in Line 10, e to x, by en 

22 Multiply items in line 16, e to x, by ei? 

23 Add algebraically Linee 18. 10, 20, 21, 2 

24 Divide Line 23 by negative eta 


2 








1.000 

-—'06 

26 Multiply items in Line 1 , f to 

27 Multiply items in Line 6, f to 

28 Multiply itema in Line 10, f t 

29 Multiply items in Line 16, f t 

30 Multiply items in Line 23, f t 

31 Add algebraically Lines 2S, 2* 

32 Divide Line 31 by negative fn 



o x, by In. 



o x, by fu - 
5, 27, 28, 2 


9, 30 

! 


(This worksheet is set up for combining a maximum of six items with the cri- 
terion. It may be used for any number less than six but becomes really economical 
with four or more items.) 

Substitute values from above table for symbols in following equations (B's for each equation found when 
each variable In turn is solved for) and solve equations for the regression coefficients, Bi, Bj, B«, B n . 



(Bt)f24 4- x24 
m«)fl7 + ffl.)eit + zit 
1 Bi)f n + (Bi)eu 


U)fii + (Bijeu + (B*)du + iu 

U)f« + (Biles + (B«)a« + (Bi)oi + xi « + .657 

U)fi 4- Wei + (Bi)di + (Bi)ot 4* (Bi)b* + « - 


(.657 x —.26) + .78 


4-516 


Formula for Multiple Correlation 


Having found thn NVtnl regression coefficients, the multiple R is to be found by the following formula: 


Reas n — V Bmi + Bim + Bira + 

Ban - .SIS x .780 -.402 
Bn - .847 X .821 - .639 
.941 


. . + 

VIST - .97 


•Worksheet quoted from Journal of Educational Research, by permission of the publisher* 9. 




































246 Better Teaching Through Testing 

and the relative value of each test in the total battery. The formula 
for this computation reads: 

Bx test 1 4 - B* test 2 -f Bn test n 

As applied to Ro.u in the volley ball illustration it reads 

■5 ‘6 (44-) service test #1 -f .657 (■££) volleying test = 

.184 service test #1 -f- .164 volleying test 
Likewise, the regression equations for the other batteries will 
read 

.076 service test #2 -j- . 1 54 volleying test 
. 1 77 volleying test -j- . 1 89 set-ups 
.085 service test #2 -j- .162 set-ups 
However, weightings such as these are too time consuming, and 
fortunately unnecessary. All that is required is to get the proper 
proportion of each test. In these examples simple addition of the 
raw scores would not be far out of line but in some batteries simple 
addition would be completely impossible. For example, if combin- 
ing the scores on a running and throwing test, most of the running 
scores might vary between 41/2 and 8 seconds, most of the throwing 
scores might vary between 25 and 75 feet. Simple addition would 
give undue importance to the throwing score; inadequacy or su- 
periority in running would show up only if the players received 
comparable scores on both tests. 

In order to maintain proper proportions between the respective 
tests it is necessary only to compute relationship between the 
weightings and substitute in the equation. For example, 

. 1 84 service test #1 -j- .164 volleying test 
would be changed to 

1 . 1 service test # 1 -j- 1 . volleying test. 

This computation is then simple enough to be done without 
paper and pencil or tables. It has an advantage over simple addi- 
tion of keeping the scores on the two tests in proper proportion. 
The simplified versions of the second battery might read 
1. service test #24-2. volleying test 
or 

.5 service test #24-1. volleying test 
The ease of computation would be about equal in those two forms. 



*47 


Simple Statistical Procedures 
la the third case, 

i . volleying test -{- i . 1 set-up test 
is much simpler than 

.9 volleying -f- 1. set-up test 
In the last case again it makes little difference whether you use 

1. service test #2 + 2. set-up test 
or 

.5 service test #2 + 1. set-up test 
If T-scales are computed for each test separately and the T-scores 
added, this sum may be used instead of a score computed by the 
regression equation. This is an easier process and the results are 
comparable. 

11. Compute Norms for Use of the Tests 

Raw scores are not very meaningful to either the teacher or 
the student. Therefore, it is considered an essential part of any 
project in which tests are being developed that standards of some 
sort be devised. A very convenient form ir the T-scale. This pro- 
cedure has been discussed above. 


BIBLIOGRAPHY 

1. Bovard, John F., and Cozens, Frederick W.: Tests and Measurements in Physi- 
cal Education, Second Edition. W. B. Saunders Company, New York, 1938, 
Chapters 13, 14, 18 

2. Garrett, Henry E.: Statistics in Psychology and Education. Longmans, Green & 
Company, New York, 1935, Chapter G 

3. Glassow, Ruth B., and Broer, Marion R.: Measuring Achievement in Physical 
Education. W. B. Saunders Company, 1938, Chapters 1 and 2 

4. Holzinger, Karl J.: Statistical Methods for Students in Education. Ginn Re 
Company, New York, 1928 

5. Lindquist, E. F.: A First Course in Statistics. Houghton Mifflin Company, 
Cambridge, 1938 

6. Lindquist, E. F., and Cook, Walter W.: "Experimental Procedures in Test 
Evaluation,” Journal of Experimental Education, 1, 1933, p. 163 

7. Long, John A., and Sandiford, Peter.: "Validity of Test Items,” Bulletin of 
Department of Educational Research, University of Toronto, 3, 1935, p. 118 

8. McCloy, C. H.: Tests and Measurement in Health and Physical Education, 
2d edition. F. S. Crofts & Sompany, New York, 1942, Chapter 25 

9. Peters, Charles C., and Wykes, Elizabeth Crossley: "Simplified Methods for 
Computing Regression Coefficients and Partial and Multiple Correlations,” 
Journal of Educational Research, 3, May 1931, p. 383 

10. Swineford, Frances: “Validity of Test Items,” Journal of Educational Psy- 
chology, 27, 1936, p. 68 





